Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with EEG-Vision Transformer

Ali Akbari, Kosar Sanjar Arani, Tony Yousefnezhad, Maryam Mirian, Emad Arasteh
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:204-218, 2024.

Abstract

Reconstructing visual stimuli from brain activity is a challenging problem, particularly when using EEG data, which is more affordable and accessible than fMRI but noisier and lower in spatial resolution. In this paper, we present Hierarchical-ViT, a novel framework designed to improve the quality and precision of EEG-based image reconstruction by integrating hierarchical visual feature extraction, vision transformer-based EEG (EEG-ViT) processing, and CLIP-based joint learning. Inspired by the hierarchical nature of the human visual system, our model progressively captures complex visual features-such as edges, textures, and shapes-through a multi-stage processing approach. These features are aligned with EEG signals processed by the EEG-ViT model, allowing for the creation of a shared latent space that enhances contrastive learning. A StyleGAN is then employed to generate high-resolution images from these aligned representations. We evaluated our method on two benchmark datasets, EEGCVPR40 and ThoughtViz, achieving superior results compared to existing approaches in terms of Inception Score (IS), Kernel Inception Distance (KID), and Frchet Inception Distance (FID) for EEGCVPR, and IS and KID for the ThoughtViz dataset. Through an ablation study, we underscored the feasibility of hierarchical feature extraction, while multivariate analysis of variance (MANOVA) test confirmed the distinctiveness of the learned feature spaces. In conclusion, our results show the feasibility and uniqueness of using hierarchical filtering of perceived images combined with EEG-ViT-based features to improve brain decoding from EEG data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v285-akbari24a, title = {Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with {EEG}-Vision Transformer}, author = {Akbari, Ali and Arani, Kosar Sanjar and Yousefnezhad, Tony and Mirian, Maryam and Arasteh, Emad}, booktitle = {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models}, pages = {204--218}, year = {2024}, editor = {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly}, volume = {285}, series = {Proceedings of Machine Learning Research}, month = {14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v285/main/assets/akbari24a/akbari24a.pdf}, url = {https://proceedings.mlr.press/v285/akbari24a.html}, abstract = {Reconstructing visual stimuli from brain activity is a challenging problem, particularly when using EEG data, which is more affordable and accessible than fMRI but noisier and lower in spatial resolution. In this paper, we present Hierarchical-ViT, a novel framework designed to improve the quality and precision of EEG-based image reconstruction by integrating hierarchical visual feature extraction, vision transformer-based EEG (EEG-ViT) processing, and CLIP-based joint learning. Inspired by the hierarchical nature of the human visual system, our model progressively captures complex visual features-such as edges, textures, and shapes-through a multi-stage processing approach. These features are aligned with EEG signals processed by the EEG-ViT model, allowing for the creation of a shared latent space that enhances contrastive learning. A StyleGAN is then employed to generate high-resolution images from these aligned representations. We evaluated our method on two benchmark datasets, EEGCVPR40 and ThoughtViz, achieving superior results compared to existing approaches in terms of Inception Score (IS), Kernel Inception Distance (KID), and Frchet Inception Distance (FID) for EEGCVPR, and IS and KID for the ThoughtViz dataset. Through an ablation study, we underscored the feasibility of hierarchical feature extraction, while multivariate analysis of variance (MANOVA) test confirmed the distinctiveness of the learned feature spaces. In conclusion, our results show the feasibility and uniqueness of using hierarchical filtering of perceived images combined with EEG-ViT-based features to improve brain decoding from EEG data.} }
Endnote
%0 Conference Paper %T Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with EEG-Vision Transformer %A Ali Akbari %A Kosar Sanjar Arani %A Tony Yousefnezhad %A Maryam Mirian %A Emad Arasteh %B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Clementine Domine %E Zorah Lähner %E Donato Crisostomi %E Luca Moschella %E Kimberly Stachenfeld %F pmlr-v285-akbari24a %I PMLR %P 204--218 %U https://proceedings.mlr.press/v285/akbari24a.html %V 285 %X Reconstructing visual stimuli from brain activity is a challenging problem, particularly when using EEG data, which is more affordable and accessible than fMRI but noisier and lower in spatial resolution. In this paper, we present Hierarchical-ViT, a novel framework designed to improve the quality and precision of EEG-based image reconstruction by integrating hierarchical visual feature extraction, vision transformer-based EEG (EEG-ViT) processing, and CLIP-based joint learning. Inspired by the hierarchical nature of the human visual system, our model progressively captures complex visual features-such as edges, textures, and shapes-through a multi-stage processing approach. These features are aligned with EEG signals processed by the EEG-ViT model, allowing for the creation of a shared latent space that enhances contrastive learning. A StyleGAN is then employed to generate high-resolution images from these aligned representations. We evaluated our method on two benchmark datasets, EEGCVPR40 and ThoughtViz, achieving superior results compared to existing approaches in terms of Inception Score (IS), Kernel Inception Distance (KID), and Frchet Inception Distance (FID) for EEGCVPR, and IS and KID for the ThoughtViz dataset. Through an ablation study, we underscored the feasibility of hierarchical feature extraction, while multivariate analysis of variance (MANOVA) test confirmed the distinctiveness of the learned feature spaces. In conclusion, our results show the feasibility and uniqueness of using hierarchical filtering of perceived images combined with EEG-ViT-based features to improve brain decoding from EEG data.
APA
Akbari, A., Arani, K.S., Yousefnezhad, T., Mirian, M. & Arasteh, E.. (2024). Joint Learning for Visual Reconstruction from the Brain Activity: Hierarchical Representation of Image Perception with EEG-Vision Transformer. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:204-218 Available from https://proceedings.mlr.press/v285/akbari24a.html.

Related Material