Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Jiachen T. Wang; Prateek Mittal; Ruoxi Jia

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2557-2565, 2024.

Abstract

This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-wang24e,
  title = 	 {Efficient Data {S}hapley for Weighted Nearest Neighbor Algorithms},
  author =       {Wang, Jiachen T. and Mittal, Prateek and Jia, Ruoxi},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2557--2565},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/wang24e/wang24e.pdf},
  url = 	 {https://proceedings.mlr.press/v238/wang24e.html},
  abstract = 	 {This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.}
}

Endnote

%0 Conference Paper
%T Efficient Data Shapley for Weighted Nearest Neighbor Algorithms
%A Jiachen T. Wang
%A Prateek Mittal
%A Ruoxi Jia
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-wang24e
%I PMLR
%P 2557--2565
%U https://proceedings.mlr.press/v238/wang24e.html
%V 238
%X This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

APA

Wang, J.T., Mittal, P. & Jia, R.. (2024). Efficient Data Shapley for Weighted Nearest Neighbor Algorithms. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2557-2565 Available from https://proceedings.mlr.press/v238/wang24e.html.

Related Material

Download PDF