Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Yanlai Yang, Mengye Ren
Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:182-203, 2026.

Abstract

Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v330-yang26a, title = {Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos}, author = {Yang, Yanlai and Ren, Mengye}, booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents}, pages = {182--203}, year = {2026}, editor = {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal}, volume = {330}, series = {Proceedings of Machine Learning Research}, month = {11--14 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v330/main/assets/yang26a/yang26a.pdf}, url = {https://proceedings.mlr.press/v330/yang26a.html}, abstract = {Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.} }
Endnote
%0 Conference Paper %T Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos %A Yanlai Yang %A Mengye Ren %B Proceedings of The 4th Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2026 %E Sarath Chandar %E Razvan Pascanu %E Eric Eaton %E Bing Liu %E Rupam Mahmood %E Amal Rannen-Triki %F pmlr-v330-yang26a %I PMLR %P 182--203 %U https://proceedings.mlr.press/v330/yang26a.html %V 330 %X Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.
APA
Yang, Y. & Ren, M.. (2026). Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:182-203 Available from https://proceedings.mlr.press/v330/yang26a.html.

Related Material