Efficient Diversified Mini-Batch Selection using Variable High-layer Features

Wanming Huang; Richard Yi Da Xu; Ian Oppermann

Efficient Diversified Mini-Batch Selection using Variable High-layer Features

Wanming Huang, Richard Yi Da Xu, Ian Oppermann

Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:300-315, 2019.

Abstract

Stochastic Gradient Descent (SGD) has been widely adopted in training Deep Neural networks of various structures. Instead of using a full dataset, a so-called {\itshape mini-batch} is selected during each gradient descent iteration. This aims to speed up the learning when a large number of training data is present. Without the knowledge of its true underlying distribution, one often samples the data indices uniformly. Recently, researchers applied a diversified mini-batch selection scheme through the use of Determinantal Point Process (DPP), in order to avoid having highly correlated samples in one batch ({{Zhang et al.}} ({2017})). Despite its success, the attempts were restrictive in the sense that they used fixed features to construct the Gram-matrix for DPP; using the raw or fixed higher-layer features limited the amount of potential improvement over the convergence rate. In this paper, we instead proposed to use variable higher-layer features which are updated at each iteration when the parameter changes. To avoid the high computation cost, several contributions have been made to speed up the computation of DPP sampling, including: (1) using hierarchical sampling to break down a single DPP sampling with large Gram-matrix into many DPP samplings of much smaller Gram-matrix and (2) using Markov k-DPP to encourage diversity across iterations. Empirical results show a much more diversified mini batch in each iteration in addition to a much improved convergence compared with the previous approach.

Cite this Paper

BibTeX


@InProceedings{pmlr-v101-huang19b,
  title = 	 {Efficient Diversified Mini-Batch Selection using Variable High-layer Features},
  author =       {Huang, Wanming and Xu, Richard Yi Da and Oppermann, Ian},
  booktitle = 	 {Proceedings of The Eleventh Asian Conference on Machine Learning},
  pages = 	 {300--315},
  year = 	 {2019},
  editor = 	 {Lee, Wee Sun and Suzuki, Taiji},
  volume = 	 {101},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v101/huang19b/huang19b.pdf},
  url = 	 {https://proceedings.mlr.press/v101/huang19b.html},
  abstract = 	 {Stochastic Gradient Descent (SGD) has been widely adopted in training Deep Neural networks of various structures. Instead of using a full dataset, a so-called {\itshape mini-batch} is selected during each gradient descent iteration. This aims to speed up the learning when a large number of training data is present. Without the knowledge of its true underlying distribution, one often samples the data indices uniformly. Recently, researchers applied a diversified mini-batch selection scheme through the use of Determinantal Point Process (DPP), in order to avoid having highly correlated samples in one batch ({{Zhang et al.}} ({2017})). Despite its success, the attempts were restrictive in the sense that they used fixed features to construct the Gram-matrix for DPP; using the raw or fixed higher-layer features limited the amount of potential improvement over the convergence rate. In this paper, we instead proposed to use variable higher-layer features which are updated at each iteration when the parameter changes. To avoid the high computation cost, several contributions have been made to speed up the computation of DPP sampling, including: (1) using hierarchical sampling to break down a single DPP sampling with large Gram-matrix into many DPP samplings of much smaller Gram-matrix and (2) using Markov k-DPP to encourage diversity across iterations. Empirical results show a much more diversified mini batch in each iteration in addition to a much improved convergence compared with the previous approach.}
}

Endnote

%0 Conference Paper
%T Efficient Diversified Mini-Batch Selection using Variable High-layer Features
%A Wanming Huang
%A Richard Yi Da Xu
%A Ian Oppermann
%B Proceedings of The Eleventh Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Wee Sun Lee
%E Taiji Suzuki	
%F pmlr-v101-huang19b
%I PMLR
%P 300--315
%U https://proceedings.mlr.press/v101/huang19b.html
%V 101
%X Stochastic Gradient Descent (SGD) has been widely adopted in training Deep Neural networks of various structures. Instead of using a full dataset, a so-called {\itshape mini-batch} is selected during each gradient descent iteration. This aims to speed up the learning when a large number of training data is present. Without the knowledge of its true underlying distribution, one often samples the data indices uniformly. Recently, researchers applied a diversified mini-batch selection scheme through the use of Determinantal Point Process (DPP), in order to avoid having highly correlated samples in one batch ({{Zhang et al.}} ({2017})). Despite its success, the attempts were restrictive in the sense that they used fixed features to construct the Gram-matrix for DPP; using the raw or fixed higher-layer features limited the amount of potential improvement over the convergence rate. In this paper, we instead proposed to use variable higher-layer features which are updated at each iteration when the parameter changes. To avoid the high computation cost, several contributions have been made to speed up the computation of DPP sampling, including: (1) using hierarchical sampling to break down a single DPP sampling with large Gram-matrix into many DPP samplings of much smaller Gram-matrix and (2) using Markov k-DPP to encourage diversity across iterations. Empirical results show a much more diversified mini batch in each iteration in addition to a much improved convergence compared with the previous approach.

APA


Huang, W., Xu, R.Y.D. & Oppermann, I.. (2019). Efficient Diversified Mini-Batch Selection using Variable High-layer Features. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:300-315 Available from https://proceedings.mlr.press/v101/huang19b.html.

Related Material

Download PDF