Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Jiachen T. Wang, Prateek Mittal, Ruoxi Jia
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2557-2565, 2024.

Abstract

This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-t-wang24a, title = { Efficient Data {S}hapley for Weighted Nearest Neighbor Algorithms }, author = {T. Wang, Jiachen and Mittal, Prateek and Jia, Ruoxi}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2557--2565}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/t-wang24a/t-wang24a.pdf}, url = {https://proceedings.mlr.press/v238/t-wang24a.html}, abstract = { This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart. } }
Endnote
%0 Conference Paper %T Efficient Data Shapley for Weighted Nearest Neighbor Algorithms %A Jiachen T. Wang %A Prateek Mittal %A Ruoxi Jia %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-t-wang24a %I PMLR %P 2557--2565 %U https://proceedings.mlr.press/v238/t-wang24a.html %V 238 %X This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
APA
T. Wang, J., Mittal, P. & Jia, R.. (2024). Efficient Data Shapley for Weighted Nearest Neighbor Algorithms . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2557-2565 Available from https://proceedings.mlr.press/v238/t-wang24a.html.

Related Material