Revisiting the Effects of Stochasticity for Hamiltonian Samplers

Giulio Franzese; Dimitrios Milios; Maurizio Filippone; Pietro Michiardi

Revisiting the Effects of Stochasticity for Hamiltonian Samplers

Giulio Franzese, Dimitrios Milios, Maurizio Filippone, Pietro Michiardi

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6744-6778, 2022.

Abstract

We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-franzese22a,
  title = 	 {Revisiting the Effects of Stochasticity for {H}amiltonian Samplers},
  author =       {Franzese, Giulio and Milios, Dimitrios and Filippone, Maurizio and Michiardi, Pietro},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {6744--6778},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/franzese22a/franzese22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/franzese22a.html},
  abstract = 	 {We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.}
}

Endnote

%0 Conference Paper
%T Revisiting the Effects of Stochasticity for Hamiltonian Samplers
%A Giulio Franzese
%A Dimitrios Milios
%A Maurizio Filippone
%A Pietro Michiardi
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-franzese22a
%I PMLR
%P 6744--6778
%U https://proceedings.mlr.press/v162/franzese22a.html
%V 162
%X We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.

APA

Franzese, G., Milios, D., Filippone, M. & Michiardi, P.. (2022). Revisiting the Effects of Stochasticity for Hamiltonian Samplers. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6744-6778 Available from https://proceedings.mlr.press/v162/franzese22a.html.

Related Material

Download PDF