LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning

Timothy Castiglia, Yi Zhou, Shiqiang Wang, Swanand Kadhe, Nathalie Baracaldo, Stacy Patterson
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:3757-3781, 2023.

Abstract

We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-castiglia23a, title = {{LESS}-{VFL}: Communication-Efficient Feature Selection for Vertical Federated Learning}, author = {Castiglia, Timothy and Zhou, Yi and Wang, Shiqiang and Kadhe, Swanand and Baracaldo, Nathalie and Patterson, Stacy}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {3757--3781}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/castiglia23a/castiglia23a.pdf}, url = {https://proceedings.mlr.press/v202/castiglia23a.html}, abstract = {We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.} }
Endnote
%0 Conference Paper %T LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning %A Timothy Castiglia %A Yi Zhou %A Shiqiang Wang %A Swanand Kadhe %A Nathalie Baracaldo %A Stacy Patterson %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-castiglia23a %I PMLR %P 3757--3781 %U https://proceedings.mlr.press/v202/castiglia23a.html %V 202 %X We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
APA
Castiglia, T., Zhou, Y., Wang, S., Kadhe, S., Baracaldo, N. & Patterson, S.. (2023). LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:3757-3781 Available from https://proceedings.mlr.press/v202/castiglia23a.html.

Related Material