Reverse-engineering deep ReLU networks

David Rolnick, Konrad Kording
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8178-8187, 2020.

Abstract

The output of a neural network depends on its architecture and weights in a highly nonlinear way, and it is often assumed that a network’s parameters cannot be recovered from its output. Here, we prove that, in fact, it is frequently possible to reconstruct the architecture, weights, and biases of a deep ReLU network by observing only its output. We leverage the fact that every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-rolnick20a, title = {Reverse-engineering deep {R}e{LU} networks}, author = {Rolnick, David and Kording, Konrad}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8178--8187}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/rolnick20a/rolnick20a.pdf}, url = {https://proceedings.mlr.press/v119/rolnick20a.html}, abstract = {The output of a neural network depends on its architecture and weights in a highly nonlinear way, and it is often assumed that a network’s parameters cannot be recovered from its output. Here, we prove that, in fact, it is frequently possible to reconstruct the architecture, weights, and biases of a deep ReLU network by observing only its output. We leverage the fact that every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.} }
Endnote
%0 Conference Paper %T Reverse-engineering deep ReLU networks %A David Rolnick %A Konrad Kording %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-rolnick20a %I PMLR %P 8178--8187 %U https://proceedings.mlr.press/v119/rolnick20a.html %V 119 %X The output of a neural network depends on its architecture and weights in a highly nonlinear way, and it is often assumed that a network’s parameters cannot be recovered from its output. Here, we prove that, in fact, it is frequently possible to reconstruct the architecture, weights, and biases of a deep ReLU network by observing only its output. We leverage the fact that every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.
APA
Rolnick, D. & Kording, K.. (2020). Reverse-engineering deep ReLU networks. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8178-8187 Available from https://proceedings.mlr.press/v119/rolnick20a.html.

Related Material