Kernel Stein Discrepancy Descent

Anna Korba; Pierre-Cyril Aubin-Frankowski; Szymon Majewski; Pierre Ablin

Kernel Stein Discrepancy Descent

Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre Ablin

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5719-5730, 2021.

Abstract

Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant. This leads to a straightforwardly implementable, deterministic score-based method to sample from $\pi$, named KSD Descent, which uses a set of particles to approximate $\pi$. Remarkably, owing to a tractable loss function, KSD Descent can leverage robust parameter-free optimization schemes such as L-BFGS; this contrasts with other popular particle-based schemes such as the Stein Variational Gradient Descent algorithm. We study the convergence properties of KSD Descent and demonstrate its practical relevance. However, we also highlight failure cases by showing that the algorithm can get stuck in spurious local minima.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-korba21a,
  title = 	 {Kernel Stein Discrepancy Descent},
  author =       {Korba, Anna and Aubin-Frankowski, Pierre-Cyril and Majewski, Szymon and Ablin, Pierre},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5719--5730},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/korba21a/korba21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/korba21a.html},
  abstract = 	 {Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant. This leads to a straightforwardly implementable, deterministic score-based method to sample from $\pi$, named KSD Descent, which uses a set of particles to approximate $\pi$. Remarkably, owing to a tractable loss function, KSD Descent can leverage robust parameter-free optimization schemes such as L-BFGS; this contrasts with other popular particle-based schemes such as the Stein Variational Gradient Descent algorithm. We study the convergence properties of KSD Descent and demonstrate its practical relevance. However, we also highlight failure cases by showing that the algorithm can get stuck in spurious local minima.}
}

Endnote

%0 Conference Paper
%T Kernel Stein Discrepancy Descent
%A Anna Korba
%A Pierre-Cyril Aubin-Frankowski
%A Szymon Majewski
%A Pierre Ablin
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-korba21a
%I PMLR
%P 5719--5730
%U https://proceedings.mlr.press/v139/korba21a.html
%V 139
%X Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant. This leads to a straightforwardly implementable, deterministic score-based method to sample from $\pi$, named KSD Descent, which uses a set of particles to approximate $\pi$. Remarkably, owing to a tractable loss function, KSD Descent can leverage robust parameter-free optimization schemes such as L-BFGS; this contrasts with other popular particle-based schemes such as the Stein Variational Gradient Descent algorithm. We study the convergence properties of KSD Descent and demonstrate its practical relevance. However, we also highlight failure cases by showing that the algorithm can get stuck in spurious local minima.

APA

Korba, A., Aubin-Frankowski, P., Majewski, S. & Ablin, P.. (2021). Kernel Stein Discrepancy Descent. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5719-5730 Available from https://proceedings.mlr.press/v139/korba21a.html.

Kernel Stein Discrepancy Descent

Abstract

Cite this Paper

Related Material