Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions

Supratim Shit, Gurmehak Kaur Chadha, Surendra Kumar, Bapi Chatterjee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55303-55324, 2025.

Abstract

Coreset, as a summary of training data, offers an efficient approach for reducing data processing and storage complexity during training. In the emerging vertical federated learning (VFL) setting, where scattered clients store different data features, it directly reduces communication complexity. In this work, we introduce coresets construction for regularized logistic regression both in centralized and VFL settings. Additionally, we improve the coreset size for regularized linear regression in the VFL setting. We also eliminate the dependency of the coreset size on a property of the data due to the VFL setting. The improvement in the coreset sizes is due to our novel coreset construction algorithms that capture the reduced model complexity due to the added regularization and its subsequent analysis. In experiments, we provide extensive empirical evaluation that backs our theoretical claims. We also report the performance of our coresets by comparing the models trained on the complete data and on the coreset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shit25a, title = {Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions}, author = {Shit, Supratim and Chadha, Gurmehak Kaur and Kumar, Surendra and Chatterjee, Bapi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {55303--55324}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shit25a/shit25a.pdf}, url = {https://proceedings.mlr.press/v267/shit25a.html}, abstract = {Coreset, as a summary of training data, offers an efficient approach for reducing data processing and storage complexity during training. In the emerging vertical federated learning (VFL) setting, where scattered clients store different data features, it directly reduces communication complexity. In this work, we introduce coresets construction for regularized logistic regression both in centralized and VFL settings. Additionally, we improve the coreset size for regularized linear regression in the VFL setting. We also eliminate the dependency of the coreset size on a property of the data due to the VFL setting. The improvement in the coreset sizes is due to our novel coreset construction algorithms that capture the reduced model complexity due to the added regularization and its subsequent analysis. In experiments, we provide extensive empirical evaluation that backs our theoretical claims. We also report the performance of our coresets by comparing the models trained on the complete data and on the coreset.} }
Endnote
%0 Conference Paper %T Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions %A Supratim Shit %A Gurmehak Kaur Chadha %A Surendra Kumar %A Bapi Chatterjee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shit25a %I PMLR %P 55303--55324 %U https://proceedings.mlr.press/v267/shit25a.html %V 267 %X Coreset, as a summary of training data, offers an efficient approach for reducing data processing and storage complexity during training. In the emerging vertical federated learning (VFL) setting, where scattered clients store different data features, it directly reduces communication complexity. In this work, we introduce coresets construction for regularized logistic regression both in centralized and VFL settings. Additionally, we improve the coreset size for regularized linear regression in the VFL setting. We also eliminate the dependency of the coreset size on a property of the data due to the VFL setting. The improvement in the coreset sizes is due to our novel coreset construction algorithms that capture the reduced model complexity due to the added regularization and its subsequent analysis. In experiments, we provide extensive empirical evaluation that backs our theoretical claims. We also report the performance of our coresets by comparing the models trained on the complete data and on the coreset.
APA
Shit, S., Chadha, G.K., Kumar, S. & Chatterjee, B.. (2025). Improved Coresets for Vertical Federated Learning: Regularized Linear and Logistic Regressions. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55303-55324 Available from https://proceedings.mlr.press/v267/shit25a.html.

Related Material