Prediction-Oriented Subsampling from Data Streams

Benedetta Lavinia Mussati, Freddie Bickford Smith, Tom Rainforth, S Roberts
Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:565-580, 2026.

Abstract

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.

Cite this Paper


BibTeX
@InProceedings{pmlr-v330-mussati26a, title = {Prediction-Oriented Subsampling from Data Streams}, author = {Mussati, Benedetta Lavinia and Smith, Freddie Bickford and Rainforth, Tom and Roberts, S}, booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents}, pages = {565--580}, year = {2026}, editor = {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal}, volume = {330}, series = {Proceedings of Machine Learning Research}, month = {11--14 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v330/main/assets/mussati26a/mussati26a.pdf}, url = {https://proceedings.mlr.press/v330/mussati26a.html}, abstract = {Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.} }
Endnote
%0 Conference Paper %T Prediction-Oriented Subsampling from Data Streams %A Benedetta Lavinia Mussati %A Freddie Bickford Smith %A Tom Rainforth %A S Roberts %B Proceedings of The 4th Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2026 %E Sarath Chandar %E Razvan Pascanu %E Eric Eaton %E Bing Liu %E Rupam Mahmood %E Amal Rannen-Triki %F pmlr-v330-mussati26a %I PMLR %P 565--580 %U https://proceedings.mlr.press/v330/mussati26a.html %V 330 %X Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.
APA
Mussati, B.L., Smith, F.B., Rainforth, T. & Roberts, S.. (2026). Prediction-Oriented Subsampling from Data Streams. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:565-580 Available from https://proceedings.mlr.press/v330/mussati26a.html.

Related Material