Bio-inspired parameter reuse: Exploiting inter-frame representation similarity with recurrence for accelerating temporal visual processing

Zuowen Wang, Longbiao Cheng, Joachim Ott, Pehuen Moure, Shih-Chii Liu
Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, PMLR 243:209-222, 2024.

Abstract

Feedforward neural networks are the dominant approach in current computer vision research. They typically do not incorporate recurrence, which is a prominent feature of biological vision brain circuitry. Inspired by biological findings, we introduce $\textbf{RecSlowFast}$, a recurrent slow-fast framework aimed at showing how recurrence can be useful for temporal visual processing. We perform a variable number of recurrent steps of certain layers in a network receiving input video frames, where each recurrent step is equivalent to a feedforward layer with weights reuse. By harnessing the hidden states extracted from the previous input frame, we reduce the computation cost by executing fewer recurrent steps on temporally correlated consecutive frames, while keeping good task accuracy. The early termination of the recurrence can be dynamically determined through newly introduced criteria based on the distance between hidden states and without using any auxiliary scheduler network. RecSlowFast $\textbf{reuses a single set of parameters}$, unlike previous work which requires one computationally heavy network and one light network, to achieve the speed versus accuracy trade-off. Using a new $\textit{Temporal Pathfinder}$ dataset proposed in this work, we evaluate RecSlowFast on a task to continuously detect the longest evolving contour in a video. The slow-fast inference mechanism speeds up the average frame per second by 279% on this dataset with comparable task accuracy using a desktop GPU. We further demonstrate a similar trend on CamVid, a video semantic segmentation dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v243-wang24a, title = {Bio-inspired parameter reuse: Exploiting inter-frame representation similarity with recurrence for accelerating temporal visual processing}, author = {Wang, Zuowen and Cheng, Longbiao and Ott, Joachim and Moure, Pehuen and Liu, Shih-Chii}, booktitle = {Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models}, pages = {209--222}, year = {2024}, editor = {Fumero, Marco and Rodolá, Emanuele and Domine, Clementine and Locatello, Francesco and Dziugaite, Karolina and Mathilde, Caron}, volume = {243}, series = {Proceedings of Machine Learning Research}, month = {15 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v243/wang24a/wang24a.pdf}, url = {https://proceedings.mlr.press/v243/wang24a.html}, abstract = {Feedforward neural networks are the dominant approach in current computer vision research. They typically do not incorporate recurrence, which is a prominent feature of biological vision brain circuitry. Inspired by biological findings, we introduce $\textbf{RecSlowFast}$, a recurrent slow-fast framework aimed at showing how recurrence can be useful for temporal visual processing. We perform a variable number of recurrent steps of certain layers in a network receiving input video frames, where each recurrent step is equivalent to a feedforward layer with weights reuse. By harnessing the hidden states extracted from the previous input frame, we reduce the computation cost by executing fewer recurrent steps on temporally correlated consecutive frames, while keeping good task accuracy. The early termination of the recurrence can be dynamically determined through newly introduced criteria based on the distance between hidden states and without using any auxiliary scheduler network. RecSlowFast $\textbf{reuses a single set of parameters}$, unlike previous work which requires one computationally heavy network and one light network, to achieve the speed versus accuracy trade-off. Using a new $\textit{Temporal Pathfinder}$ dataset proposed in this work, we evaluate RecSlowFast on a task to continuously detect the longest evolving contour in a video. The slow-fast inference mechanism speeds up the average frame per second by 279% on this dataset with comparable task accuracy using a desktop GPU. We further demonstrate a similar trend on CamVid, a video semantic segmentation dataset.} }
Endnote
%0 Conference Paper %T Bio-inspired parameter reuse: Exploiting inter-frame representation similarity with recurrence for accelerating temporal visual processing %A Zuowen Wang %A Longbiao Cheng %A Joachim Ott %A Pehuen Moure %A Shih-Chii Liu %B Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Emanuele Rodolá %E Clementine Domine %E Francesco Locatello %E Karolina Dziugaite %E Caron Mathilde %F pmlr-v243-wang24a %I PMLR %P 209--222 %U https://proceedings.mlr.press/v243/wang24a.html %V 243 %X Feedforward neural networks are the dominant approach in current computer vision research. They typically do not incorporate recurrence, which is a prominent feature of biological vision brain circuitry. Inspired by biological findings, we introduce $\textbf{RecSlowFast}$, a recurrent slow-fast framework aimed at showing how recurrence can be useful for temporal visual processing. We perform a variable number of recurrent steps of certain layers in a network receiving input video frames, where each recurrent step is equivalent to a feedforward layer with weights reuse. By harnessing the hidden states extracted from the previous input frame, we reduce the computation cost by executing fewer recurrent steps on temporally correlated consecutive frames, while keeping good task accuracy. The early termination of the recurrence can be dynamically determined through newly introduced criteria based on the distance between hidden states and without using any auxiliary scheduler network. RecSlowFast $\textbf{reuses a single set of parameters}$, unlike previous work which requires one computationally heavy network and one light network, to achieve the speed versus accuracy trade-off. Using a new $\textit{Temporal Pathfinder}$ dataset proposed in this work, we evaluate RecSlowFast on a task to continuously detect the longest evolving contour in a video. The slow-fast inference mechanism speeds up the average frame per second by 279% on this dataset with comparable task accuracy using a desktop GPU. We further demonstrate a similar trend on CamVid, a video semantic segmentation dataset.
APA
Wang, Z., Cheng, L., Ott, J., Moure, P. & Liu, S.. (2024). Bio-inspired parameter reuse: Exploiting inter-frame representation similarity with recurrence for accelerating temporal visual processing. Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 243:209-222 Available from https://proceedings.mlr.press/v243/wang24a.html.

Related Material