Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach

Dheeraj Baby, Boran Han, Shuai Zhang, Cuixiong Hu, Bernie Wang, Yu-Xiang Wang
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3853-3861, 2025.

Abstract

We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best "attention span” while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(\log T)$ OL instances) and adaptive (it automatically chooses OL instances with the ideal "attention” length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a multi-resolution instance design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-baby25a, title = {Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach}, author = {Baby, Dheeraj and Han, Boran and Zhang, Shuai and Hu, Cuixiong and Wang, Bernie and Wang, Yu-Xiang}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3853--3861}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/baby25a/baby25a.pdf}, url = {https://proceedings.mlr.press/v258/baby25a.html}, abstract = {We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best "attention span” while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(\log T)$ OL instances) and adaptive (it automatically chooses OL instances with the ideal "attention” length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a multi-resolution instance design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.} }
Endnote
%0 Conference Paper %T Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach %A Dheeraj Baby %A Boran Han %A Shuai Zhang %A Cuixiong Hu %A Bernie Wang %A Yu-Xiang Wang %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-baby25a %I PMLR %P 3853--3861 %U https://proceedings.mlr.press/v258/baby25a.html %V 258 %X We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best "attention span” while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(\log T)$ OL instances) and adaptive (it automatically chooses OL instances with the ideal "attention” length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a multi-resolution instance design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.
APA
Baby, D., Han, B., Zhang, S., Hu, C., Wang, B. & Wang, Y.. (2025). Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3853-3861 Available from https://proceedings.mlr.press/v258/baby25a.html.

Related Material