DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Zhaoxuan Wu, Yao Shu, Bryan Kian Hsiang Low
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24150-24176, 2022.

Abstract

Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-wu22j, title = {{DAVINZ}: Data Valuation using Deep Neural Networks at Initialization}, author = {Wu, Zhaoxuan and Shu, Yao and Low, Bryan Kian Hsiang}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {24150--24176}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/wu22j/wu22j.pdf}, url = {https://proceedings.mlr.press/v162/wu22j.html}, abstract = {Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.} }
Endnote
%0 Conference Paper %T DAVINZ: Data Valuation using Deep Neural Networks at Initialization %A Zhaoxuan Wu %A Yao Shu %A Bryan Kian Hsiang Low %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-wu22j %I PMLR %P 24150--24176 %U https://proceedings.mlr.press/v162/wu22j.html %V 162 %X Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.
APA
Wu, Z., Shu, Y. & Low, B.K.H.. (2022). DAVINZ: Data Valuation using Deep Neural Networks at Initialization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:24150-24176 Available from https://proceedings.mlr.press/v162/wu22j.html.

Related Material