Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training

Youjia Zhou, Yi Zhou, Jie Ding, Bei Wang
Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), PMLR 221:134-145, 2023.

Abstract

Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure-in particular, mapper graphs-of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v221-zhou23a, title = {Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training}, author = {Zhou, Youjia and Zhou, Yi and Ding, Jie and Wang, Bei}, booktitle = {Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML)}, pages = {134--145}, year = {2023}, editor = {Doster, Timothy and Emerson, Tegan and Kvinge, Henry and Miolane, Nina and Papillon, Mathilde and Rieck, Bastian and Sanborn, Sophia}, volume = {221}, series = {Proceedings of Machine Learning Research}, month = {28 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v221/zhou23a/zhou23a.pdf}, url = {https://proceedings.mlr.press/v221/zhou23a.html}, abstract = {Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure-in particular, mapper graphs-of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness.} }
Endnote
%0 Conference Paper %T Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training %A Youjia Zhou %A Yi Zhou %A Jie Ding %A Bei Wang %B Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) %C Proceedings of Machine Learning Research %D 2023 %E Timothy Doster %E Tegan Emerson %E Henry Kvinge %E Nina Miolane %E Mathilde Papillon %E Bastian Rieck %E Sophia Sanborn %F pmlr-v221-zhou23a %I PMLR %P 134--145 %U https://proceedings.mlr.press/v221/zhou23a.html %V 221 %X Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure-in particular, mapper graphs-of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness.
APA
Zhou, Y., Zhou, Y., Ding, J. & Wang, B.. (2023). Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training. Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), in Proceedings of Machine Learning Research 221:134-145 Available from https://proceedings.mlr.press/v221/zhou23a.html.

Related Material