DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Zhaoxuan Wu; Yao Shu; Bryan Kian Hsiang Low

DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Zhaoxuan Wu, Yao Shu, Bryan Kian Hsiang Low

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24150-24176, 2022.

Abstract

Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-wu22j,
  title = 	 {{DAVINZ}: Data Valuation using Deep Neural Networks at Initialization},
  author =       {Wu, Zhaoxuan and Shu, Yao and Low, Bryan Kian Hsiang},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {24150--24176},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/wu22j/wu22j.pdf},
  url = 	 {https://proceedings.mlr.press/v162/wu22j.html},
  abstract = 	 {Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.}
}

Endnote

%0 Conference Paper
%T DAVINZ: Data Valuation using Deep Neural Networks at Initialization
%A Zhaoxuan Wu
%A Yao Shu
%A Bryan Kian Hsiang Low
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-wu22j
%I PMLR
%P 24150--24176
%U https://proceedings.mlr.press/v162/wu22j.html
%V 162
%X Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.

APA


Wu, Z., Shu, Y. & Low, B.K.H.. (2022). DAVINZ: Data Valuation using Deep Neural Networks at Initialization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:24150-24176 Available from https://proceedings.mlr.press/v162/wu22j.html.

DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Abstract

Cite this Paper

Related Material