[edit]
Efficient Diversified Mini-Batch Selection using Variable High-layer Features
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:300-315, 2019.
Abstract
Stochastic Gradient Descent (SGD) has been widely adopted in training Deep Neural networks of various structures. Instead of using a full dataset, a so-called {\itshape mini-batch} is selected during each gradient descent iteration. This aims to speed up the learning when a large number of training data is present. Without the knowledge of its true underlying distribution, one often samples the data indices uniformly. Recently, researchers applied a diversified mini-batch selection scheme through the use of Determinantal Point Process (DPP), in order to avoid having highly correlated samples in one batch ({{Zhang et al.}} ({2017})). Despite its success, the attempts were restrictive in the sense that they used fixed features to construct the Gram-matrix for DPP; using the raw or fixed higher-layer features limited the amount of potential improvement over the convergence rate. In this paper, we instead proposed to use variable higher-layer features which are updated at each iteration when the parameter changes. To avoid the high computation cost, several contributions have been made to speed up the computation of DPP sampling, including: (1) using hierarchical sampling to break down a single DPP sampling with large Gram-matrix into many DPP samplings of much smaller Gram-matrix and (2) using Markov k-DPP to encourage diversity across iterations. Empirical results show a much more diversified mini batch in each iteration in addition to a much improved convergence compared with the previous approach.