- title: 'Deep learning interpretation: Flip points and homotopy methods' abstract: 'Deep learning models are complicated mathematical functions, and their interpretation remains a challenging research question. We formulate and solve optimization problems to answer questions about the models and their outputs. Specifically, we develop methods to study the decision boundaries of classification models using {\em flip points}. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for “yes” and “no”. The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. To compute the closest flip point, we develop a homotopy algorithm to overcome the issues of vanishing and exploding gradients and to find a feasible solution for our optimization problem. We show that computing closest flip points allows us to systematically investigate the model, identify decision boundaries, interpret and audit the model with respect to individual inputs and entire datasets, and find vulnerability against adversarial attacks. We demonstrate that flip points can help identify mistakes made by a model, improve the model’s accuracy, and reveal the most influential features for classifications.' volume: 107 URL: https://proceedings.mlr.press/v107/yousefzadeh20a.html PDF: http://proceedings.mlr.press/v107/yousefzadeh20a/yousefzadeh20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-yousefzadeh20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Roozbeh family: Yousefzadeh - given: Dianne P. family: O’Leary editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 1-26 id: yousefzadeh20a issued: date-parts: - 2020 - 8 - 16 firstpage: 1 lastpage: 26 published: 2020-08-16 00:00:00 +0000 - title: 'Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning' abstract: 'Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for \emph{typical-case} synthetic models from statistical physics, involving quantities known as \emph{Gardner capacity} and \emph{ground state energy}. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica-symmetry breaking models.' volume: 107 URL: https://proceedings.mlr.press/v107/abbaras20a.html PDF: http://proceedings.mlr.press/v107/abbaras20a/abbaras20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-abbaras20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Alia family: Abbaras - given: Benjamin family: Aubin - given: Florent family: Krzakala - given: Lenka family: Zdeborová editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 27-54 id: abbaras20a issued: date-parts: - 2020 - 8 - 16 firstpage: 27 lastpage: 54 published: 2020-08-16 00:00:00 +0000 - title: 'Exact asymptotics for phase retrieval and compressed sensing with random generative priors' abstract: 'We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that in all cases analysed generative priors have a smaller statistical-to-algorithmic gap than sparse priors, giving theoretical support to previous experimental observations that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable. ' volume: 107 URL: https://proceedings.mlr.press/v107/aubin20a.html PDF: http://proceedings.mlr.press/v107/aubin20a/aubin20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-aubin20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Benjamin family: Aubin - given: Bruno family: Loureiro - given: Antoine family: Baker - given: Florent family: Krzakala - given: Lenka family: Zdeborová editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 55-73 id: aubin20a issued: date-parts: - 2020 - 8 - 16 firstpage: 55 lastpage: 73 published: 2020-08-16 00:00:00 +0000 - title: 'SchrödingerRNN: Generative modeling of raw audio as a continuously observed quantum state' abstract: ' We introduce SchrödingeRNN, a quantum-inspired generative model for raw audio. Audio data is wave-like and is sampled from a continuous signal. Although generative modeling of raw audio has made great strides lately, relational inductive biases relevant to these two characteristics are mostly absent from models explored to date. Quantum Mechanics is a natural source of probabilistic models of wave behavior. Our model takes the form of a stochastic Schrödinger equation describing the continuous time measurement of a quantum system, and is equivalent to the continuous Matrix Product State (cMPS) representation of wavefunctions in one dimensional many-body systems. This constitutes a deep autoregressive architecture in which the system’s state is a latent representation of the past observations. We test our model on synthetic data sets of stationary and non-stationary signals. This is the first time cMPS are used in machine learning.' volume: 107 URL: https://proceedings.mlr.press/v107/mencia-uranga20a.html PDF: http://proceedings.mlr.press/v107/mencia-uranga20a/mencia-uranga20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-mencia-uranga20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Beñat family: Mencia Uranga - given: Austen family: Lamacraft editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 74-106 id: mencia-uranga20a issued: date-parts: - 2020 - 8 - 16 firstpage: 74 lastpage: 106 published: 2020-08-16 00:00:00 +0000 - title: 'On the stable recovery of deep structured linear networks under sparsity constraints' abstract: ' We consider a deep structured linear network under sparsity constraints. We study sharp conditions guaranteeing the stability of the optimal parameters defining the network. More precisely, we provide sharp conditions on the network architecture and the sample under which the error on the parameters defining the network scales linearly with the reconstruction error (i.e. the risk). Therefore, under these conditions, the weights obtained with a successful algorithms are well defined and only depend on the architecture of the network and the sample. The features in the latent spaces are stably defined. The stability property is required in order to interpret the features defined in the latent spaces. It can also lead to a guarantee on the statistical risk. This is what motivates this study. The analysis is based on the recently proposed Tensorial Lifting. The particularity of this paper is to consider a sparsity prior. This leads to a better stability constant. As an illustration, we detail the analysis and provide sharp stability guarantees for convolutional linear network under sparsity prior. In this analysis, we distinguish the role of the network architecture and the sample input. This highlights the requirements on the data in connection to parameter stability. ' volume: 107 URL: https://proceedings.mlr.press/v107/malgouyres20a.html PDF: http://proceedings.mlr.press/v107/malgouyres20a/malgouyres20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-malgouyres20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: François family: Malgouyres editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 107-127 id: malgouyres20a issued: date-parts: - 2020 - 8 - 16 firstpage: 107 lastpage: 127 published: 2020-08-16 00:00:00 +0000 - title: 'Neural network integral representations with the ReLU activation function' abstract: 'In this effort, we derive a formula for the integral representation of a shallow neural network with the ReLU activation function. We assume that the outer weighs admit a finite $L_1$-norm with respect to Lebesgue measure on the sphere. For univariate target functions we further provide a closed-form formula for all possible representations. Additionally, in this case our formula allows one to explicitly solve the least $L_1$-norm neural network representation for a given function. ' volume: 107 URL: https://proceedings.mlr.press/v107/petrosyan20a.html PDF: http://proceedings.mlr.press/v107/petrosyan20a/petrosyan20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-petrosyan20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Armenak family: Petrosyan - given: Anton family: Dereventsov - given: Clayton G. family: Webster editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 128-143 id: petrosyan20a issued: date-parts: - 2020 - 8 - 16 firstpage: 128 lastpage: 143 published: 2020-08-16 00:00:00 +0000 - title: 'A type of generalization error induced by initialization in deep neural networks' abstract: ' How initialization and loss function affect the learning of a deep neural network (DNN), specifically its generalization error, is an important problem in practice. In this work, by exploiting the linearity of DNN training dynamics in the NTK regime \citep{jacot2018neural,lee2019wide}, we provide an explicit and quantitative answer to this problem. Focusing on regression problem, we prove that, in the NTK regime, for any loss in a general class of functions, the DNN finds the same \emph{global} minima—the one that is nearest to the initial value in the parameter space, or equivalently, the one that is closest to the initial DNN output in the corresponding reproducing kernel Hilbert space. Using these optimization problems, we quantify the impact of initial output and prove that a random non-zero one increases the generalization error. We further propose an antisymmetrical initialization (ASI) trick that eliminates this type of error and accelerates the training. To understand whether the above results hold in general, we also perform experiments for DNNs in the non-NTK regime, which demonstrate the effectiveness of our theoretical results and the ASI trick in a qualitative sense. Overall, our work serves as a baseline for the further investigation of the impact of initialization and loss function on the generalization of DNNs, which can potentially guide and improve the training of DNNs in practice. ' volume: 107 URL: https://proceedings.mlr.press/v107/zhang20a.html PDF: http://proceedings.mlr.press/v107/zhang20a/zhang20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-zhang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Yaoyu family: Zhang - given: Zhi-Qin John family: Xu - given: Tao family: Luo - given: Zheng family: Ma editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 144-164 id: zhang20a issued: date-parts: - 2020 - 8 - 16 firstpage: 144 lastpage: 164 published: 2020-08-16 00:00:00 +0000 - title: 'Non-Gaussian processes and neural networks at finite widths' abstract: 'Gaussian processes are ubiquitous in nature and engineering. A case in point is a class of neural networks in the infinite-width limit, whose priors correspond to Gaussian processes. Here we perturbatively extend this correspondence to finite-width neural networks, yielding non-Gaussian processes as priors. The methodology developed herein allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow. We further develop a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors. ' volume: 107 URL: https://proceedings.mlr.press/v107/yaida20a.html PDF: http://proceedings.mlr.press/v107/yaida20a/yaida20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-yaida20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Sho family: Yaida editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 165-192 id: yaida20a issued: date-parts: - 2020 - 8 - 16 firstpage: 165 lastpage: 192 published: 2020-08-16 00:00:00 +0000 - title: 'SelectNet: Learning to Sample from the Wild for Imbalanced Data Training' abstract: ' Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data “in the wild” that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision. ' volume: 107 URL: https://proceedings.mlr.press/v107/liu20a.html PDF: http://proceedings.mlr.press/v107/liu20a/liu20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-liu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Yunru family: Liu - given: Tingran family: Gao - given: Haizhao family: Yang editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 193-206 id: liu20a issued: date-parts: - 2020 - 8 - 16 firstpage: 193 lastpage: 206 published: 2020-08-16 00:00:00 +0000 - title: 'Calibrating Multivariate Lévy Processes with Neural Networks' abstract: ' Calibrating a Lévy process usually requires characterizing its jump distribution. Traditionally this problem can be solved with nonparametric estimation using the empirical characteristic functions (ECF), assuming certain regularity, and results to date are mostly in 1D. For multivariate Lévy processes and less smooth Lévy densities, the problem becomes challenging as ECFs decay slowly and have large uncertainty because of limited observations. We solve this problem by approximating the Lévy density with a parametrized functional form; the characteristic function is then estimated using numerical integration. In our benchmarks, we used deep neural networks and found that they are robust and can capture sharp transitions in the Lévy density compared to piecewise linear functions and radial basis functions. ' volume: 107 URL: https://proceedings.mlr.press/v107/xu20a.html PDF: http://proceedings.mlr.press/v107/xu20a/xu20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-xu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Kailai family: Xu - given: Eric family: Darve editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 207-220 id: xu20a issued: date-parts: - 2020 - 8 - 16 firstpage: 207 lastpage: 220 published: 2020-08-16 00:00:00 +0000 - title: 'Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games' abstract: 'We propose a deep neural network-based algorithm to identify the Markovian Nash equilibrium of general large $N$-player stochastic differential games. Following the idea of fictitious play, we recast the $N$-player game into $N$ decoupled decision problems (one for each player) and solve them iteratively. The individual decision problem is characterized by a semilinear Hamilton-Jacobi-Bellman equation, to solve which we employ the recently developed deep BSDE method. The resulted algorithm can solve large $N$-player games for which conventional numerical methods would suffer from the curse of dimensionality. Multiple numerical examples involving identical or heterogeneous agents, with risk-neutral or risk-sensitive objectives, are tested to validate the accuracy of the proposed algorithm in large group games. Even for a fifty-player game with the presence of common noise, the proposed algorithm still finds the approximate Nash equilibrium accurately, which, to our best knowledge, is difficult to achieve by other numerical algorithms.' volume: 107 URL: https://proceedings.mlr.press/v107/han20a.html PDF: http://proceedings.mlr.press/v107/han20a/han20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-han20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Jiequn family: Han - given: Ruimeng family: Hu editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 221-245 id: han20a issued: date-parts: - 2020 - 8 - 16 firstpage: 221 lastpage: 245 published: 2020-08-16 00:00:00 +0000 - title: 'Borrowing From the Future: An Attempt to Address Double Sampling' abstract: 'For model-free reinforcement learning, one of the main challenges of stochastic Bellman residual minimization is the double sampling problem, i.e., while only one single sample for the next state is available in the model-free setting, two independent samples for the next state are required in order to perform unbiased stochastic gradient descent. We propose new algorithms for addressing this problem based on the idea of borrowing extra randomness from the future. When the transition kernel varies slowly with respect to the state, it is shown that the training trajectory of new algorithms is close to the one of unbiased stochastic gradient descent. Numerical results for policy evaluation in both tabular and neural network settings are provided to confirm the theoretical findings.' volume: 107 URL: https://proceedings.mlr.press/v107/zhu20a.html PDF: http://proceedings.mlr.press/v107/zhu20a/zhu20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-zhu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Yuhua family: Zhu - given: Lexing family: Ying editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 246-268 id: zhu20a issued: date-parts: - 2020 - 8 - 16 firstpage: 246 lastpage: 268 published: 2020-08-16 00:00:00 +0000 - title: 'Deep Domain Decomposition Method: Elliptic Problems' abstract: 'This paper proposes a deep-learning-based domain decomposition method (DeepDDM), which leverages deep neural networks (DNN) to discretize the subproblems divided by domain decomposition methods (DDM) for solving partial differential equations (PDE). Using DNN to solve PDE is a physics-informed learning problem with the objective involving two terms, domain term and boundary term, which respectively make the desired solution satisfy the PDE and corresponding boundary conditions. DeepDDM will exchange the subproblem information across the interface in DDM by adjusting the boundary term for solving each subproblem by DNN. Benefiting from the simple implementation and mesh-free strategy of using DNN for PDE, DeepDDM will simplify the implementation of DDM and make DDM more flexible for complex PDE, e.g., those with complex interfaces in the computational domain. This paper will firstly investigate the performance of using DeepDDM for elliptic problems, including a model problem and an interface problem. The numerical examples demonstrate that DeepDDM exhibits behaviors consistent with conventional DDM: the number of iterations by DeepDDM is independent of network architecture and decreases with increasing overlapping size. The performance of DeepDDM on elliptic problems will encourage us to further investigate its performance for other kinds of PDE and may provide new insights for improving the PDE solver by deep learning. ' volume: 107 URL: https://proceedings.mlr.press/v107/li20a.html PDF: http://proceedings.mlr.press/v107/li20a/li20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-li20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Wuyang family: Li - given: Xueshuang family: Xiang - given: Yingxiang family: Xu editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 269-286 id: li20a issued: date-parts: - 2020 - 8 - 16 firstpage: 269 lastpage: 286 published: 2020-08-16 00:00:00 +0000 - title: 'Landscape Complexity for the Empirical Risk of Generalized Linear Models' abstract: ' We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. We obtain a rigorous explicit variational formula for the \emph{annealed complexity}, which is the logarithm of the average number of critical points at fixed value of the empirical risk. This result is simplified, and extended, using the non-rigorous Kac-Rice replicated method from theoretical physics. In this way we find an explicit variational formula for the \emph{quenched complexity}, which is generally different from its annealed counterpart, and allows to obtain the number of critical points for typical instances up to exponential accuracy. ' volume: 107 URL: https://proceedings.mlr.press/v107/maillard20a.html PDF: http://proceedings.mlr.press/v107/maillard20a/maillard20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-maillard20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Antoine family: Maillard - given: Gérard family: Ben Arous - given: Giulio family: Biroli editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 287-327 id: maillard20a issued: date-parts: - 2020 - 8 - 16 firstpage: 287 lastpage: 327 published: 2020-08-16 00:00:00 +0000 - title: 'DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM' abstract: 'Machine learning (ML) models trained by differentially private stochastic gradient descent (DP-SGD) have much lower utility than the non-private ones. To mitigate this degradation, we propose a DP Laplacian smoothing SGD (DP-LSSGD) to train ML models with differential privacy (DP) guarantees. At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism. Under the same amount of noise used in the Gaussian mechanism, DP-LSSGD attains the same DP guarantee, but in practice, DP-LSSGD makes training both convex and nonconvex ML models more stable and enables the trained models to generalize better. The proposed algorithm is simple to implement and the extra computational complexity and memory overhead compared with DP-SGD are negligible. DP-LSSGD is applicable to train a large variety of ML models, including DNNs. The code is available at \url{https://github.com/BaoWangMath/DP-LSSGD}.' volume: 107 URL: https://proceedings.mlr.press/v107/wang20a.html PDF: http://proceedings.mlr.press/v107/wang20a/wang20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-wang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Bao family: Wang - given: Quanquan family: Gu - given: March family: Boedihardjo - given: Lingxiao family: Wang - given: Farzin family: Barekat - given: Stanley J. family: Osher editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 328-351 id: wang20a issued: date-parts: - 2020 - 8 - 16 firstpage: 328 lastpage: 351 published: 2020-08-16 00:00:00 +0000 - title: 'NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data' abstract: ' We propose a neural network based approach for extracting models from dynamic data using ordinary and partial differential equations. In particular, given a time-series or spatio-temporal dataset, we seek to identify an accurate governing system which respects the intrinsic differential structure. The unknown governing model is parameterized by using both (shallow) multilayer perceptrons and nonlinear differential terms, in order to incorporate relevant correlations between spatio-temporal samples. We demonstrate the approach on several examples where the data is sampled from various dynamical systems and give a comparison to recurrent networks and other data-discovery methods. In addition, we show that for SVHN, MNIST, Fashion MNIST, and CIFAR10/100, our approach lowers the parameter cost as compared to other deep neural networks.' volume: 107 URL: https://proceedings.mlr.press/v107/sun20a.html PDF: http://proceedings.mlr.press/v107/sun20a/sun20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-sun20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Yifan family: Sun - given: Linan family: Zhang - given: Hayden family: Schaeffer editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 352-372 id: sun20a issued: date-parts: - 2020 - 8 - 16 firstpage: 352 lastpage: 372 published: 2020-08-16 00:00:00 +0000 - title: 'The Slow Deterioration of the Generalization Error of the Random Feature Model' abstract: 'The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show, both theoretically and experimentally, that there is a dynamic self-correction mechanism at work: The larger the eventual generalization gap, the slower it develops, both because of the small eigenvalues. This gives us ample time to stop the training process and obtain solutions with good generalization property. ' volume: 107 URL: https://proceedings.mlr.press/v107/ma20a.html PDF: http://proceedings.mlr.press/v107/ma20a/ma20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-ma20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Chao family: Ma - given: Lei family: Wu - given: Weinan family: E editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 373-389 id: ma20a issued: date-parts: - 2020 - 8 - 16 firstpage: 373 lastpage: 389 published: 2020-08-16 00:00:00 +0000 - title: 'Large deviations for the perceptron model and consequences for active learning' abstract: ' Active learning is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any active learning algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing active learning algorithms. We also provide a comparison with the performance of some other popular active learning strategies. ' volume: 107 URL: https://proceedings.mlr.press/v107/cui20a.html PDF: http://proceedings.mlr.press/v107/cui20a/cui20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-cui20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Hugo family: Cui - given: Luca family: Saglietti - given: Lenka family: Zdeborova editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 390-430 id: cui20a issued: date-parts: - 2020 - 8 - 16 firstpage: 390 lastpage: 430 published: 2020-08-16 00:00:00 +0000 - title: 'Butterfly-Net2: Simplified Butterfly-Net and Fourier Transform Initialization' abstract: 'Structured CNN designed using the prior information of problems potentially improves efficiency over conventional CNNs in various tasks in solving PDEs and inverse problems in signal processing. This paper introduces BNet2, a simplified Butterfly-Net and inline with the conventional CNN. Moreover, a Fourier transform initialization is proposed for both BNet2 and CNN with guaranteed approximation power to represent the Fourier transform operator. Experimentally, BNet2 and the Fourier transform initialization strategy are tested on various tasks, including approximating Fourier transform operator, end-to-end solvers of linear and nonlinear PDEs, and denoising and deblurring of 1D signals. On all tasks, under the same initialization, BNet2 achieves similar accuracy as CNN but has fewer parameters. And Fourier transform initialized BNet2 and CNN consistently improve the training and testing accuracy over the randomly initialized CNN. ' volume: 107 URL: https://proceedings.mlr.press/v107/xu20b.html PDF: http://proceedings.mlr.press/v107/xu20b/xu20b.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-xu20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Zhongshu family: Xu - given: Yingzhou family: Li - given: Xiuyuan family: Cheng editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 431-450 id: xu20b issued: date-parts: - 2020 - 8 - 16 firstpage: 431 lastpage: 450 published: 2020-08-16 00:00:00 +0000 - title: 'Deep learning Markov and Koopman models with physical constraints' abstract: 'The long-timescale behavior of complex dynamical systems can be described by linear Markov or Koopman models in a suitable latent space. Recent variational approaches allow the latent space representation and the linear dynamical model to be optimized via unsupervised machine learning methods. Incorporation of physical constraints such as time-reversibility or stochasticity into the dynamical model has been established for a linear, but not for arbitrarily nonlinear (deep learning) representations of the latent space. Here we develop theory and methods for deep learning Markov and Koopman models that can bear such physical constraints. We prove that the model is an universal approximator for reversible Markov processes and that it can be optimized with either maximum likelihood or the variational approach of Markov processes (VAMP). We demonstrate that the model performs equally well for equilibrium and systematically better for biased data compared to existing approaches, thus providing a tool to study the long-timescale processes of dynamical systems. ' volume: 107 URL: https://proceedings.mlr.press/v107/mardt20a.html PDF: http://proceedings.mlr.press/v107/mardt20a/mardt20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-mardt20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Andreas family: Mardt - given: Luca family: Pasquali - given: Frank family: Noé - given: Hao family: Wu editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 451-475 id: mardt20a issued: date-parts: - 2020 - 8 - 16 firstpage: 451 lastpage: 475 published: 2020-08-16 00:00:00 +0000 - title: 'Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs' abstract: 'Recurrent neural networks (RNNs) are powerful dynamical models for data with complex temporal structure. However, training RNNs has traditionally proved challenging due to exploding or vanishing of gradients. RNN models such as LSTMs and GRUs (and their variants) significantly mitigate these issues associated with training by introducing various types of {\it gating} units into the architecture. While these gates empirically improve performance, how the addition of gates influences the dynamics and trainability of GRUs and LSTMs is not well understood. Here, we take the perspective of studying randomly initialized LSTMs and GRUs as dynamical systems, and ask how the salient dynamical properties are shaped by the gates. We leverage tools from random matrix theory and mean-field theory to study the state-to-state Jacobians of GRUs and LSTMs. We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics. Moreover, the GRU update gate can poise the system at a marginally stable point. The reset gate in the GRU and the output and input gates in the LSTM control the spectral radius of the Jacobian, and the GRU reset gate also modulates the complexity of the landscape of fixed-points. Furthermore, for the GRU we obtain a phase diagram describing the statistical properties of fixed-points. We also provide a preliminary comparison of training performance to the various dynamical regimes realized by varying hyperparameters. Looking to the future, we have introduced a powerful set of techniques which can be adapted to a broad class of RNNs, to study the influence of various architectural choices on dynamics, and potentially motivate the principled discovery of novel architectures. ' volume: 107 URL: https://proceedings.mlr.press/v107/can20a.html PDF: http://proceedings.mlr.press/v107/can20a/can20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-can20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Tankut family: Can - given: Kamesh family: Krishnamurthy - given: David J. family: Schwab editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 476-511 id: can20a issued: date-parts: - 2020 - 8 - 16 firstpage: 476 lastpage: 511 published: 2020-08-16 00:00:00 +0000 - title: 'Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint' abstract: 'Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations. ' volume: 107 URL: https://proceedings.mlr.press/v107/cyr20a.html PDF: http://proceedings.mlr.press/v107/cyr20a/cyr20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-cyr20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Eric C. family: Cyr - given: Mamikon A. family: Gulian - given: Ravi G. family: Patel - given: Mauro family: Perego - given: Nathaniel A. family: Trask editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 512-536 id: cyr20a issued: date-parts: - 2020 - 8 - 16 firstpage: 512 lastpage: 536 published: 2020-08-16 00:00:00 +0000 - title: 'New Potential-Based Bounds for the Geometric-Stopping Version of Prediction with Expert Advice' abstract: 'This work addresses the classic machine learning problem of online prediction with expert advice. A new potential-based framework for the fixed horizon version of this problem has been recently developed using verification arguments from optimal control theory. This paper extends this framework to the random (geometric) stopping version. To obtain explicit bounds, we construct potentials for the geometric version from potentials used for the fixed horizon version of the problem. This construction leads to new explicit lower and upper bounds associated with specific adversary and player strategies. While there are several known lower bounds in the fixed horizon setting, our lower bounds appear to be the first such results in the geometric stopping setting with an arbitrary number of experts. Our framework also leads in some cases to improved upper bounds. For two and three experts, our bounds are optimal to leading order. ' volume: 107 URL: https://proceedings.mlr.press/v107/kobzar20a.html PDF: http://proceedings.mlr.press/v107/kobzar20a/kobzar20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-kobzar20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Vladimir A. family: Kobzar - given: Robert V. family: Kohn - given: Zhilei family: Wang editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 537-554 id: kobzar20a issued: date-parts: - 2020 - 8 - 16 firstpage: 537 lastpage: 554 published: 2020-08-16 00:00:00 +0000 - title: 'Data-driven Compact Models for Circuit Design and Analysis' abstract: 'Compact semiconductor device models are essential for efficiently designing and analyzing large circuits. However, traditional compact model development requires a large amount of manual effort and can span many years. Moreover, inclusion of new physics (\eg{}, radiation effects) into an existing model is not trivial and may require redevelopment from scratch. Machine Learning (ML) techniques have the potential to automate and significantly speed up the development of compact models. In addition, ML provides a range of modeling options that can be used to develop hierarchies of compact models tailored to specific circuit design stages. In this paper, we explore three such options: (1) table-based interpolation, (2) Generalized Moving Least-Squares, and (3) feed-forward Deep Neural Networks, to develop compact models for a p-n junction diode. We evaluate the performance of these “data-driven” compact models by (1) comparing their voltage-current characteristics against laboratory data, and (2) building a bridge rectifier circuit using these devices, predicting the circuit’s behavior using SPICE-like circuit simulations, and then comparing these predictions against laboratory measurements of the same circuit. ' volume: 107 URL: https://proceedings.mlr.press/v107/aadithya20a.html PDF: http://proceedings.mlr.press/v107/aadithya20a/aadithya20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-aadithya20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: K. family: Aadithya - given: P. family: Kuberry - given: B. family: Paskaleva - given: P. family: Bochev - given: K. family: Leeson - given: A. family: Mar - given: T. family: Mei - given: E. family: Keiter editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 555-569 id: aadithya20a issued: date-parts: - 2020 - 8 - 16 firstpage: 555 lastpage: 569 published: 2020-08-16 00:00:00 +0000 - title: 'Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds' abstract: ' The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data. ' volume: 107 URL: https://proceedings.mlr.press/v107/perlmutter20a.html PDF: http://proceedings.mlr.press/v107/perlmutter20a/perlmutter20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-perlmutter20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Michael family: Perlmutter - given: Feng family: Gao - given: Guy family: Wolf - given: Matthew family: Hirn editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 570-604 id: perlmutter20a issued: date-parts: - 2020 - 8 - 16 firstpage: 570 lastpage: 604 published: 2020-08-16 00:00:00 +0000 - title: 'Policy Gradient based Quantum Approximate Optimization Algorithm' abstract: 'The quantum approximate optimization algorithm (QAOA), as a hybrid quantum/classical algorithm, has received much interest recently. QAOA can also be viewed as a variational ansatz for quantum control. However, its direct application to emergent quantum technology encounters additional physical constraints: (i) the states of the quantum system are not observable; (ii) obtaining the derivatives of the objective function can be computationally expensive or even inaccessible in experiments, and (iii) the values of the objective function may be sensitive to various sources of uncertainty, as is the case for noisy intermediate-scale quantum (NISQ) devices. Taking such constraints into account, we show that policy-gradient-based reinforcement learning (RL) algorithms are well suited for optimizing the variational parameters of QAOA in a noise-robust fashion, opening up the way for developing RL techniques for continuous quantum control. This is advantageous to help mitigate and monitor the potentially unknown sources of errors in modern quantum simulators. We analyze the performance of the algorithm for quantum state transfer problems in single- and multi-qubit systems, subject to various sources of noise such as error terms in the Hamiltonian, or quantum uncertainty in the measurement process. We show that, in noisy setups, it is capable of outperforming state-of-the-art existing optimization algorithms. ' volume: 107 URL: https://proceedings.mlr.press/v107/yao20a.html PDF: http://proceedings.mlr.press/v107/yao20a/yao20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-yao20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Jiahao family: Yao - given: Marin family: Bukov - given: Lin family: Lin editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 605-634 id: yao20a issued: date-parts: - 2020 - 8 - 16 firstpage: 605 lastpage: 634 published: 2020-08-16 00:00:00 +0000 - title: 'Quantum Ground States from Reinforcement Learning' abstract: ' Finding the ground state of a quantum mechanical system can be formulated as an optimal control problem. In this formulation, the drift of the optimally controlled process is chosen to match the distribution of paths in the Feynman–Kac (FK) representation of the solution of the imaginary time Schrödinger equation. This provides a variational principle that can be used for reinforcement learning of a neural representation of the drift. Our approach is a drop-in replacement for path integral Monte Carlo, learning an optimal importance sampler for the FK trajectories. We demonstrate the applicability of our approach to several problems of one-, two-, and many-particle physics.' volume: 107 URL: https://proceedings.mlr.press/v107/barr20a.html PDF: http://proceedings.mlr.press/v107/barr20a/barr20a.pdf edit: https://github.com/mlresearch//v107/edit/gh-pages/_posts/2020-08-16-barr20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The First Mathematical and Scientific Machine Learning Conference' publisher: 'PMLR' author: - given: Ariel family: Barr - given: Willem family: Gispen - given: Austen family: Lamacraft editor: - given: Jianfeng family: Lu - given: Rachel family: Ward page: 635-653 id: barr20a issued: date-parts: - 2020 - 8 - 16 firstpage: 635 lastpage: 653 published: 2020-08-16 00:00:00 +0000