Reverse-engineering deep ReLU networks
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8178-8187, 2020.
The output of a neural network depends on its architecture and weights in a highly nonlinear way, and it is often assumed that a network’s parameters cannot be recovered from its output. Here, we prove that, in fact, it is frequently possible to reconstruct the architecture, weights, and biases of a deep ReLU network by observing only its output. We leverage the fact that every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.