Provable Data Subset Selection For Efficient Neural Networks Training

Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, Dan Feldman
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:34533-34555, 2023.

Abstract

Radial basis function neural networks (RBFNN) are well-known for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for RBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an RBFNN on the larger input data. In particular, we construct coresets for radial basis and Laplacian loss functions. We then use our coresets to obtain a provable data subset selection algorithm for training deep neural networks. Since our coresets approximate every function, they also approximate the gradient of each weight in a neural network, which is a particular function on the input. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets, demonstrating the efficacy and accuracy of our coreset construction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-tukan23a, title = {Provable Data Subset Selection For Efficient Neural Networks Training}, author = {Tukan, Murad and Zhou, Samson and Maalouf, Alaa and Rus, Daniela and Braverman, Vladimir and Feldman, Dan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {34533--34555}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/tukan23a/tukan23a.pdf}, url = {https://proceedings.mlr.press/v202/tukan23a.html}, abstract = {Radial basis function neural networks (RBFNN) are well-known for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for RBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an RBFNN on the larger input data. In particular, we construct coresets for radial basis and Laplacian loss functions. We then use our coresets to obtain a provable data subset selection algorithm for training deep neural networks. Since our coresets approximate every function, they also approximate the gradient of each weight in a neural network, which is a particular function on the input. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets, demonstrating the efficacy and accuracy of our coreset construction.} }
Endnote
%0 Conference Paper %T Provable Data Subset Selection For Efficient Neural Networks Training %A Murad Tukan %A Samson Zhou %A Alaa Maalouf %A Daniela Rus %A Vladimir Braverman %A Dan Feldman %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-tukan23a %I PMLR %P 34533--34555 %U https://proceedings.mlr.press/v202/tukan23a.html %V 202 %X Radial basis function neural networks (RBFNN) are well-known for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for RBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an RBFNN on the larger input data. In particular, we construct coresets for radial basis and Laplacian loss functions. We then use our coresets to obtain a provable data subset selection algorithm for training deep neural networks. Since our coresets approximate every function, they also approximate the gradient of each weight in a neural network, which is a particular function on the input. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets, demonstrating the efficacy and accuracy of our coreset construction.
APA
Tukan, M., Zhou, S., Maalouf, A., Rus, D., Braverman, V. & Feldman, D.. (2023). Provable Data Subset Selection For Efficient Neural Networks Training. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:34533-34555 Available from https://proceedings.mlr.press/v202/tukan23a.html.

Related Material