Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani; Yossi Adi; Lior Wolf

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani, Yossi Adi, Lior Wolf

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:7164-7175, 2020.

Abstract

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-nachmani20a,
  title = 	 {Voice Separation with an Unknown Number of Multiple Speakers},
  author =       {Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {7164--7175},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/nachmani20a/nachmani20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/nachmani20a.html},
  abstract = 	 {We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.}
}

Endnote

%0 Conference Paper
%T Voice Separation with an Unknown Number of Multiple Speakers
%A Eliya Nachmani
%A Yossi Adi
%A Lior Wolf
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-nachmani20a
%I PMLR
%P 7164--7175
%U https://proceedings.mlr.press/v119/nachmani20a.html
%V 119
%X We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

APA

Nachmani, E., Adi, Y. & Wolf, L.. (2020). Voice Separation with an Unknown Number of Multiple Speakers. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:7164-7175 Available from https://proceedings.mlr.press/v119/nachmani20a.html.

Related Material

Download PDF