Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations

Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan K. Turaga, Theodore P. Pavlic
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:85-99, 2024.

Abstract

When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with the target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing approaches, called debiasing methods, have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating extra auxiliary models. Instead, we provide an orthogonal perspective from the existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, DGW enhances the transparency of decision-making processes by visualizing which features of the inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. At the end, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v285-hong24a, title = {Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations}, author = {Hong, Jinyung and Jeon, Eun Som and Kim, Changhoon and Park, Keun Hee and Nath, Utkarsh and Yang, Yezhou and Turaga, Pavan K. and Pavlic, Theodore P.}, booktitle = {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models}, pages = {85--99}, year = {2024}, editor = {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly}, volume = {285}, series = {Proceedings of Machine Learning Research}, month = {14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v285/main/assets/hong24a/hong24a.pdf}, url = {https://proceedings.mlr.press/v285/hong24a.html}, abstract = {When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with the target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing approaches, called debiasing methods, have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating extra auxiliary models. Instead, we provide an orthogonal perspective from the existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, DGW enhances the transparency of decision-making processes by visualizing which features of the inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. At the end, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.} }
Endnote
%0 Conference Paper %T Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations %A Jinyung Hong %A Eun Som Jeon %A Changhoon Kim %A Keun Hee Park %A Utkarsh Nath %A Yezhou Yang %A Pavan K. Turaga %A Theodore P. Pavlic %B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Clementine Domine %E Zorah Lähner %E Donato Crisostomi %E Luca Moschella %E Kimberly Stachenfeld %F pmlr-v285-hong24a %I PMLR %P 85--99 %U https://proceedings.mlr.press/v285/hong24a.html %V 285 %X When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with the target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing approaches, called debiasing methods, have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating extra auxiliary models. Instead, we provide an orthogonal perspective from the existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, DGW enhances the transparency of decision-making processes by visualizing which features of the inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. At the end, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.
APA
Hong, J., Jeon, E.S., Kim, C., Park, K.H., Nath, U., Yang, Y., Turaga, P.K. & Pavlic, T.P.. (2024). Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:85-99 Available from https://proceedings.mlr.press/v285/hong24a.html.

Related Material