Fast Stealthily Biased Sampling Using Sliced Wasserstein Distance

Yudai Yamamoto, Satoshi Hara
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:873-888, 2025.

Abstract

Ensuring fairness is essential when implementing machine learning models in practical applications. However, recent research has revealed that benchmark datasets can be crafted as fake evidence of fairness from unfair models using a method called Stealthily Biased Sampling (SBS). SBS minimizes the Wasserstein distance to manipulate a fake benchmark so that the distribution of the benchmark closely resembles the true data distribution. This optimization requires superquadratic time relative to the dataset size, making SBS applicable only to small-sized datasets. In this study, we reveal for the first time that the risk of manipulated benchmark datasets exists even for large-sized datasets. This finding indicates the necessity of considering the potential for manipulated benchmarks regardless of their size. To demonstrate this risk, we developed FastSBS, a computationally efficient variant of SBS using the Sliced Wasserstein distance. FastSBS is optimized by a stochastic gradient-based method, which requires only nearly linear time for each update. In experiments with both synthetic and real-world datasets, we show that FastSBS is an order of magnitude faster than the original SBS for large datasets while maintaining the quality of the manipulated benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-yamamoto25a, title = {Fast Stealthily Biased Sampling Using Sliced Wasserstein Distance}, author = {Yamamoto, Yudai and Hara, Satoshi}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {873--888}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/yamamoto25a/yamamoto25a.pdf}, url = {https://proceedings.mlr.press/v260/yamamoto25a.html}, abstract = {Ensuring fairness is essential when implementing machine learning models in practical applications. However, recent research has revealed that benchmark datasets can be crafted as fake evidence of fairness from unfair models using a method called Stealthily Biased Sampling (SBS). SBS minimizes the Wasserstein distance to manipulate a fake benchmark so that the distribution of the benchmark closely resembles the true data distribution. This optimization requires superquadratic time relative to the dataset size, making SBS applicable only to small-sized datasets. In this study, we reveal for the first time that the risk of manipulated benchmark datasets exists even for large-sized datasets. This finding indicates the necessity of considering the potential for manipulated benchmarks regardless of their size. To demonstrate this risk, we developed FastSBS, a computationally efficient variant of SBS using the Sliced Wasserstein distance. FastSBS is optimized by a stochastic gradient-based method, which requires only nearly linear time for each update. In experiments with both synthetic and real-world datasets, we show that FastSBS is an order of magnitude faster than the original SBS for large datasets while maintaining the quality of the manipulated benchmark.} }
Endnote
%0 Conference Paper %T Fast Stealthily Biased Sampling Using Sliced Wasserstein Distance %A Yudai Yamamoto %A Satoshi Hara %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-yamamoto25a %I PMLR %P 873--888 %U https://proceedings.mlr.press/v260/yamamoto25a.html %V 260 %X Ensuring fairness is essential when implementing machine learning models in practical applications. However, recent research has revealed that benchmark datasets can be crafted as fake evidence of fairness from unfair models using a method called Stealthily Biased Sampling (SBS). SBS minimizes the Wasserstein distance to manipulate a fake benchmark so that the distribution of the benchmark closely resembles the true data distribution. This optimization requires superquadratic time relative to the dataset size, making SBS applicable only to small-sized datasets. In this study, we reveal for the first time that the risk of manipulated benchmark datasets exists even for large-sized datasets. This finding indicates the necessity of considering the potential for manipulated benchmarks regardless of their size. To demonstrate this risk, we developed FastSBS, a computationally efficient variant of SBS using the Sliced Wasserstein distance. FastSBS is optimized by a stochastic gradient-based method, which requires only nearly linear time for each update. In experiments with both synthetic and real-world datasets, we show that FastSBS is an order of magnitude faster than the original SBS for large datasets while maintaining the quality of the manipulated benchmark.
APA
Yamamoto, Y. & Hara, S.. (2025). Fast Stealthily Biased Sampling Using Sliced Wasserstein Distance. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:873-888 Available from https://proceedings.mlr.press/v260/yamamoto25a.html.

Related Material