Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation

Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:37008-37041, 2023.

Abstract

Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework. Our code is available at github.com/jeffwillette/umbc

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-willette23a, title = {Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation}, author = {Willette, Jeffrey and Lee, Seanie and Andreis, Bruno and Kawaguchi, Kenji and Lee, Juho and Hwang, Sung Ju}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {37008--37041}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/willette23a/willette23a.pdf}, url = {https://proceedings.mlr.press/v202/willette23a.html}, abstract = {Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework. Our code is available at github.com/jeffwillette/umbc} }
Endnote
%0 Conference Paper %T Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation %A Jeffrey Willette %A Seanie Lee %A Bruno Andreis %A Kenji Kawaguchi %A Juho Lee %A Sung Ju Hwang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-willette23a %I PMLR %P 37008--37041 %U https://proceedings.mlr.press/v202/willette23a.html %V 202 %X Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework. Our code is available at github.com/jeffwillette/umbc
APA
Willette, J., Lee, S., Andreis, B., Kawaguchi, K., Lee, J. & Hwang, S.J.. (2023). Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:37008-37041 Available from https://proceedings.mlr.press/v202/willette23a.html.

Related Material