Efficient Parametric Approximations of Neural Network Function Space Distance

Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Baker Grosse
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:7795-7812, 2023.

Abstract

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-dhawan23a, title = {Efficient Parametric Approximations of Neural Network Function Space Distance}, author = {Dhawan, Nikita and Huang, Sicong and Bae, Juhan and Grosse, Roger Baker}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {7795--7812}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/dhawan23a/dhawan23a.pdf}, url = {https://proceedings.mlr.press/v202/dhawan23a.html}, abstract = {It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.} }
Endnote
%0 Conference Paper %T Efficient Parametric Approximations of Neural Network Function Space Distance %A Nikita Dhawan %A Sicong Huang %A Juhan Bae %A Roger Baker Grosse %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-dhawan23a %I PMLR %P 7795--7812 %U https://proceedings.mlr.press/v202/dhawan23a.html %V 202 %X It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.
APA
Dhawan, N., Huang, S., Bae, J. & Grosse, R.B.. (2023). Efficient Parametric Approximations of Neural Network Function Space Distance. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:7795-7812 Available from https://proceedings.mlr.press/v202/dhawan23a.html.

Related Material