PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses

Adel Javanmard, Matthew Fahrbach, Vahab Mirrokni
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21410-21429, 2024.

Abstract

This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We then propose the $\texttt{PriorBoost}$ algorithm, which adaptively forms bags of samples that are increasingly homogeneous with respect to (unobserved) individual responses to improve model quality. We study label differential privacy for aggregate learning, and we also provide extensive experiments showing that $\texttt{PriorBoost}$ regularly achieves optimal model quality for event-level predictions, in stark contrast to non-adaptive algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-javanmard24a, title = {{P}rior{B}oost: An Adaptive Algorithm for Learning from Aggregate Responses}, author = {Javanmard, Adel and Fahrbach, Matthew and Mirrokni, Vahab}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21410--21429}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/javanmard24a/javanmard24a.pdf}, url = {https://proceedings.mlr.press/v235/javanmard24a.html}, abstract = {This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We then propose the $\texttt{PriorBoost}$ algorithm, which adaptively forms bags of samples that are increasingly homogeneous with respect to (unobserved) individual responses to improve model quality. We study label differential privacy for aggregate learning, and we also provide extensive experiments showing that $\texttt{PriorBoost}$ regularly achieves optimal model quality for event-level predictions, in stark contrast to non-adaptive algorithms.} }
Endnote
%0 Conference Paper %T PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses %A Adel Javanmard %A Matthew Fahrbach %A Vahab Mirrokni %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-javanmard24a %I PMLR %P 21410--21429 %U https://proceedings.mlr.press/v235/javanmard24a.html %V 235 %X This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We then propose the $\texttt{PriorBoost}$ algorithm, which adaptively forms bags of samples that are increasingly homogeneous with respect to (unobserved) individual responses to improve model quality. We study label differential privacy for aggregate learning, and we also provide extensive experiments showing that $\texttt{PriorBoost}$ regularly achieves optimal model quality for event-level predictions, in stark contrast to non-adaptive algorithms.
APA
Javanmard, A., Fahrbach, M. & Mirrokni, V.. (2024). PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21410-21429 Available from https://proceedings.mlr.press/v235/javanmard24a.html.

Related Material