Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues

Jiaxuan Chen, Yu Qi, Gang Pan
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4856-4866, 2023.

Abstract

Decoding seen images from brain activities has been an absorbing field. However, the reconstructed images still suffer from low quality with existing studies. This can be because our visual system is not like a camera that ”remembers” every pixel. Instead, only part of the information can be perceived with our selective attention, and the brain ”guesses” the rest to form what we think we see. Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal. Specifically, we first construct a shared discrete representation space for both brain signals and images. Then, a novel self-supervised token-to-token inpainting network is designed to implement visual content completion by building context and prior knowledge about the visual objects from the discrete latent space. Our approach improved the quality of visual reconstruction significantly and achieved state-of-the-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chen23v, title = {Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues}, author = {Chen, Jiaxuan and Qi, Yu and Pan, Gang}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {4856--4866}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chen23v/chen23v.pdf}, url = {https://proceedings.mlr.press/v202/chen23v.html}, abstract = {Decoding seen images from brain activities has been an absorbing field. However, the reconstructed images still suffer from low quality with existing studies. This can be because our visual system is not like a camera that ”remembers” every pixel. Instead, only part of the information can be perceived with our selective attention, and the brain ”guesses” the rest to form what we think we see. Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal. Specifically, we first construct a shared discrete representation space for both brain signals and images. Then, a novel self-supervised token-to-token inpainting network is designed to implement visual content completion by building context and prior knowledge about the visual objects from the discrete latent space. Our approach improved the quality of visual reconstruction significantly and achieved state-of-the-art.} }
Endnote
%0 Conference Paper %T Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues %A Jiaxuan Chen %A Yu Qi %A Gang Pan %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chen23v %I PMLR %P 4856--4866 %U https://proceedings.mlr.press/v202/chen23v.html %V 202 %X Decoding seen images from brain activities has been an absorbing field. However, the reconstructed images still suffer from low quality with existing studies. This can be because our visual system is not like a camera that ”remembers” every pixel. Instead, only part of the information can be perceived with our selective attention, and the brain ”guesses” the rest to form what we think we see. Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal. Specifically, we first construct a shared discrete representation space for both brain signals and images. Then, a novel self-supervised token-to-token inpainting network is designed to implement visual content completion by building context and prior knowledge about the visual objects from the discrete latent space. Our approach improved the quality of visual reconstruction significantly and achieved state-of-the-art.
APA
Chen, J., Qi, Y. & Pan, G.. (2023). Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:4856-4866 Available from https://proceedings.mlr.press/v202/chen23v.html.

Related Material