Towards Efficient Data Valuation Based on the Shapley Value

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, Costas J. Spanos
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1167-1176, 2019.

Abstract

{\em “How much is my data worth?”} is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires \emph{exponential} time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-jia19a, title = {Towards Efficient Data Valuation Based on the Shapley Value}, author = {Jia, Ruoxi and Dao, David and Wang, Boxin and Hubis, Frances Ann and Hynes, Nick and G\"{u}rel, Nezihe Merve and Li, Bo and Zhang, Ce and Song, Dawn and Spanos, Costas J.}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1167--1176}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/jia19a/jia19a.pdf}, url = {https://proceedings.mlr.press/v89/jia19a.html}, abstract = {{\em “How much is my data worth?”} is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires \emph{exponential} time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.} }
Endnote
%0 Conference Paper %T Towards Efficient Data Valuation Based on the Shapley Value %A Ruoxi Jia %A David Dao %A Boxin Wang %A Frances Ann Hubis %A Nick Hynes %A Nezihe Merve Gürel %A Bo Li %A Ce Zhang %A Dawn Song %A Costas J. Spanos %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-jia19a %I PMLR %P 1167--1176 %U https://proceedings.mlr.press/v89/jia19a.html %V 89 %X {\em “How much is my data worth?”} is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires \emph{exponential} time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.
APA
Jia, R., Dao, D., Wang, B., Hubis, F.A., Hynes, N., Gürel, N.M., Li, B., Zhang, C., Song, D. & Spanos, C.J.. (2019). Towards Efficient Data Valuation Based on the Shapley Value. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1167-1176 Available from https://proceedings.mlr.press/v89/jia19a.html.

Related Material