Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, Yingyan Celine Lin
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40475-40487, 2023.

Abstract

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, Master-ASR learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that Master-ASR can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13∼2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER with nearly 100 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-yu23l, title = {Master-{ASR}: Achieving Multilingual Scalability and Low-Resource Adaptation in {ASR} with Modular Learning}, author = {Yu, Zhongzhi and Zhang, Yang and Qian, Kaizhi and Wan, Cheng and Fu, Yonggan and Zhang, Yongan and Lin, Yingyan Celine}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {40475--40487}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/yu23l/yu23l.pdf}, url = {https://proceedings.mlr.press/v202/yu23l.html}, abstract = {Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, Master-ASR learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that Master-ASR can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13∼2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER with nearly 100 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.} }
Endnote
%0 Conference Paper %T Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning %A Zhongzhi Yu %A Yang Zhang %A Kaizhi Qian %A Cheng Wan %A Yonggan Fu %A Yongan Zhang %A Yingyan Celine Lin %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-yu23l %I PMLR %P 40475--40487 %U https://proceedings.mlr.press/v202/yu23l.html %V 202 %X Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, Master-ASR learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that Master-ASR can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13∼2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER with nearly 100 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.
APA
Yu, Z., Zhang, Y., Qian, K., Wan, C., Fu, Y., Zhang, Y. & Lin, Y.C.. (2023). Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:40475-40487 Available from https://proceedings.mlr.press/v202/yu23l.html.

Related Material