An Adaptive Method for Weak Supervision with Drifting Data

Alessio Mazzetto, Reza Esfandiarpoor, Akash Singirikonda, Eli Upfal, Stephen Bach
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1513-1521, 2025.

Abstract

We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-mazzetto25a, title = {An Adaptive Method for Weak Supervision with Drifting Data}, author = {Mazzetto, Alessio and Esfandiarpoor, Reza and Singirikonda, Akash and Upfal, Eli and Bach, Stephen}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1513--1521}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/mazzetto25a/mazzetto25a.pdf}, url = {https://proceedings.mlr.press/v258/mazzetto25a.html}, abstract = {We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.} }
Endnote
%0 Conference Paper %T An Adaptive Method for Weak Supervision with Drifting Data %A Alessio Mazzetto %A Reza Esfandiarpoor %A Akash Singirikonda %A Eli Upfal %A Stephen Bach %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-mazzetto25a %I PMLR %P 1513--1521 %U https://proceedings.mlr.press/v258/mazzetto25a.html %V 258 %X We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.
APA
Mazzetto, A., Esfandiarpoor, R., Singirikonda, A., Upfal, E. & Bach, S.. (2025). An Adaptive Method for Weak Supervision with Drifting Data. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1513-1521 Available from https://proceedings.mlr.press/v258/mazzetto25a.html.

Related Material