RSTSIC: Reparameterized Swin Transformer Stereo Image Compression

Yuxuan Zhao, Junwei Zhou, Jiaxin Li, Cheng Tan, Jianwen Xiang
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:766-781, 2025.

Abstract

Stereo image compression (SIC) aims to enhance compression performance and efficiency by exploiting cross-view redundancy in overlapping fields between stereo images. However, current SIC methods faces practical limitations in adequately exploiting inter-view correlations and contextual information due to occlusions, disparity variations, and computational overhead. To effectively extract contextual information and efficiently model cross-view dependencies in stereo images, we propose a novel distributed stereo image compression framework, Reparameterized Swin Transformer Stereo Image Compression (RSTSIC) integrating Reparameterized Swin Block (RSB) and Cross Feature Enhancement Modules (CFEMs) in the joint decoder. CFEMs progressively aggregate cross-view dependencies and enhance cross feature interaction efficiency. RSB integrates window-based self-attention with convolutional operations to effectively leverage non-local contextual information, while maintaining inference efficiency through structural reparameterization. RSTSIC outperforms traditional codecs and deep stereo compression methods on both Cityscapes and InStereo2K datasets, with at least 58.57% reduction in model parameters and 36.43% decrease in FLOPs compared to state-of-the-art compression models. Ablation studies confirm the necessity of CFEMs and RSB for efficient compression and perceptual fidelity. Our code is available at https://github.com/SnowBlind0/RSTSIC.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-zhao25a, title = {RSTSIC: Reparameterized Swin Transformer Stereo Image Compression}, author = {Zhao, Yuxuan and Zhou, Junwei and Li, Jiaxin and Tan, Cheng and Xiang, Jianwen}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {766--781}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/zhao25a/zhao25a.pdf}, url = {https://proceedings.mlr.press/v304/zhao25a.html}, abstract = {Stereo image compression (SIC) aims to enhance compression performance and efficiency by exploiting cross-view redundancy in overlapping fields between stereo images. However, current SIC methods faces practical limitations in adequately exploiting inter-view correlations and contextual information due to occlusions, disparity variations, and computational overhead. To effectively extract contextual information and efficiently model cross-view dependencies in stereo images, we propose a novel distributed stereo image compression framework, Reparameterized Swin Transformer Stereo Image Compression (RSTSIC) integrating Reparameterized Swin Block (RSB) and Cross Feature Enhancement Modules (CFEMs) in the joint decoder. CFEMs progressively aggregate cross-view dependencies and enhance cross feature interaction efficiency. RSB integrates window-based self-attention with convolutional operations to effectively leverage non-local contextual information, while maintaining inference efficiency through structural reparameterization. RSTSIC outperforms traditional codecs and deep stereo compression methods on both Cityscapes and InStereo2K datasets, with at least 58.57% reduction in model parameters and 36.43% decrease in FLOPs compared to state-of-the-art compression models. Ablation studies confirm the necessity of CFEMs and RSB for efficient compression and perceptual fidelity. Our code is available at https://github.com/SnowBlind0/RSTSIC.} }
Endnote
%0 Conference Paper %T RSTSIC: Reparameterized Swin Transformer Stereo Image Compression %A Yuxuan Zhao %A Junwei Zhou %A Jiaxin Li %A Cheng Tan %A Jianwen Xiang %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-zhao25a %I PMLR %P 766--781 %U https://proceedings.mlr.press/v304/zhao25a.html %V 304 %X Stereo image compression (SIC) aims to enhance compression performance and efficiency by exploiting cross-view redundancy in overlapping fields between stereo images. However, current SIC methods faces practical limitations in adequately exploiting inter-view correlations and contextual information due to occlusions, disparity variations, and computational overhead. To effectively extract contextual information and efficiently model cross-view dependencies in stereo images, we propose a novel distributed stereo image compression framework, Reparameterized Swin Transformer Stereo Image Compression (RSTSIC) integrating Reparameterized Swin Block (RSB) and Cross Feature Enhancement Modules (CFEMs) in the joint decoder. CFEMs progressively aggregate cross-view dependencies and enhance cross feature interaction efficiency. RSB integrates window-based self-attention with convolutional operations to effectively leverage non-local contextual information, while maintaining inference efficiency through structural reparameterization. RSTSIC outperforms traditional codecs and deep stereo compression methods on both Cityscapes and InStereo2K datasets, with at least 58.57% reduction in model parameters and 36.43% decrease in FLOPs compared to state-of-the-art compression models. Ablation studies confirm the necessity of CFEMs and RSB for efficient compression and perceptual fidelity. Our code is available at https://github.com/SnowBlind0/RSTSIC.
APA
Zhao, Y., Zhou, J., Li, J., Tan, C. & Xiang, J.. (2025). RSTSIC: Reparameterized Swin Transformer Stereo Image Compression. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:766-781 Available from https://proceedings.mlr.press/v304/zhao25a.html.

Related Material