An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

Nicholas Bryan; Gautham Mysore

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

Nicholas Bryan, Gautham Mysore

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):208-216, 2013.

Abstract

In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources. One of the current most effective class of methods to do so is based on non-negative matrix factorization and related latent variable models. Such techniques, however, typically perform poorly when no isolated training data is given and do not allow user feedback to correct for poor results. To overcome these issues, we allow a user to interactively constrain a latent variable model by painting on a time-frequency display of sound to guide the learning process. The annotations are used within the framework of posterior regularization to impose linear grouping constraints that would otherwise be difficult to achieve via standard priors. For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-bryan13,
  title = 	 {An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation},
  author = 	 {Bryan, Nicholas and Mysore, Gautham},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {208--216},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/bryan13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/bryan13.html},
  abstract = 	 {In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources.  One of the current most effective class of methods to do so is based on non-negative matrix factorization and related latent variable models.  Such techniques, however, typically perform poorly when no isolated training data is given and do not allow user feedback to correct for poor results. To overcome these issues, we allow a user to interactively constrain a latent variable model by painting on a time-frequency display of sound to guide the learning process.  The annotations are used within the framework of posterior regularization to impose linear grouping constraints that would otherwise be difficult to achieve via standard priors.  For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.}
}

Endnote

%0 Conference Paper
%T An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation
%A Nicholas Bryan
%A Gautham Mysore
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-bryan13
%I PMLR
%P 208--216
%U https://proceedings.mlr.press/v28/bryan13.html
%V 28
%N 3
%X In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources.  One of the current most effective class of methods to do so is based on non-negative matrix factorization and related latent variable models.  Such techniques, however, typically perform poorly when no isolated training data is given and do not allow user feedback to correct for poor results. To overcome these issues, we allow a user to interactively constrain a latent variable model by painting on a time-frequency display of sound to guide the learning process.  The annotations are used within the framework of posterior regularization to impose linear grouping constraints that would otherwise be difficult to achieve via standard priors.  For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.

RIS


TY  - CPAPER
TI  - An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation
AU  - Nicholas Bryan
AU  - Gautham Mysore
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-bryan13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 208
EP  - 216
L1  - http://proceedings.mlr.press/v28/bryan13.pdf
UR  - https://proceedings.mlr.press/v28/bryan13.html
AB  - In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources.  One of the current most effective class of methods to do so is based on non-negative matrix factorization and related latent variable models.  Such techniques, however, typically perform poorly when no isolated training data is given and do not allow user feedback to correct for poor results. To overcome these issues, we allow a user to interactively constrain a latent variable model by painting on a time-frequency display of sound to guide the learning process.  The annotations are used within the framework of posterior regularization to impose linear grouping constraints that would otherwise be difficult to achieve via standard priors.  For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.
ER  -

APA


Bryan, N. & Mysore, G.. (2013). An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):208-216 Available from https://proceedings.mlr.press/v28/bryan13.html.

Related Material

Download PDF