Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Yanlai Yang; Mengye Ren

Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Yanlai Yang, Mengye Ren

Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:182-203, 2026.

Abstract

Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v330-yang26a,
  title = 	 {Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos},
  author =       {Yang, Yanlai and Ren, Mengye},
  booktitle = 	 {Proceedings of The 4th Conference on Lifelong Learning Agents},
  pages = 	 {182--203},
  year = 	 {2026},
  editor = 	 {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal},
  volume = 	 {330},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v330/main/assets/yang26a/yang26a.pdf},
  url = 	 {https://proceedings.mlr.press/v330/yang26a.html},
  abstract = 	 {Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.}
}

Endnote

%0 Conference Paper
%T Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos
%A Yanlai Yang
%A Mengye Ren
%B Proceedings of The 4th Conference on Lifelong Learning Agents
%C Proceedings of Machine Learning Research
%D 2026
%E Sarath Chandar
%E Razvan Pascanu
%E Eric Eaton
%E Bing Liu
%E Rupam Mahmood
%E Amal Rannen-Triki	
%F pmlr-v330-yang26a
%I PMLR
%P 182--203
%U https://proceedings.mlr.press/v330/yang26a.html
%V 330
%X Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose “Memory Storyboard,” a novel continual self-supervised learning framework that groups recent past frames into temporal segments for a more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, where the storyboard temporal segments are produced and then transferred to a long-term memory. Experiments on two real-world egocentric video datasets show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.

APA

Yang, Y. & Ren, M.. (2026). Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:182-203 Available from https://proceedings.mlr.press/v330/yang26a.html.

Related Material

Download PDF