ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Lufei Liu; Tor M. Aamodt

ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Lufei Liu, Tor M. Aamodt

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:38049-38065, 2025.

Abstract

Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4$\times$ speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-liu25a,
  title = 	 {{R}e{F}rame: Layer Caching for Accelerated Inference in Real-Time Rendering},
  author =       {Liu, Lufei and Aamodt, Tor M.},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {38049--38065},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25a/liu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/liu25a.html},
  abstract = 	 {Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4$\times$ speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/}
}

Endnote

%0 Conference Paper
%T ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering
%A Lufei Liu
%A Tor M. Aamodt
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-liu25a
%I PMLR
%P 38049--38065
%U https://proceedings.mlr.press/v267/liu25a.html
%V 267
%X Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4$\times$ speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/

APA

Liu, L. & Aamodt, T.M.. (2025). ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:38049-38065 Available from https://proceedings.mlr.press/v267/liu25a.html.

ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Abstract

Cite this Paper

Related Material