Set Based Stochastic Subsampling

Bruno Andreis; Seanie Lee; A. Tuan Nguyen; Juho Lee; Eunho Yang; Sung Ju Hwang

Set Based Stochastic Subsampling

Bruno Andreis, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, Sung Ju Hwang

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:619-638, 2022.

Abstract

Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-andreis22a,
  title = 	 {Set Based Stochastic Subsampling},
  author =       {Andreis, Bruno and Lee, Seanie and Nguyen, A. Tuan and Lee, Juho and Yang, Eunho and Hwang, Sung Ju},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {619--638},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/andreis22a/andreis22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/andreis22a.html},
  abstract = 	 {Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.}
}

Endnote

%0 Conference Paper
%T Set Based Stochastic Subsampling
%A Bruno Andreis
%A Seanie Lee
%A A. Tuan Nguyen
%A Juho Lee
%A Eunho Yang
%A Sung Ju Hwang
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-andreis22a
%I PMLR
%P 619--638
%U https://proceedings.mlr.press/v162/andreis22a.html
%V 162
%X Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.

APA

Andreis, B., Lee, S., Nguyen, A.T., Lee, J., Yang, E. & Hwang, S.J.. (2022). Set Based Stochastic Subsampling. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:619-638 Available from https://proceedings.mlr.press/v162/andreis22a.html.

Related Material

Download PDF