Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) on 28 July 2023 Published as Volume 221 by the Proceedings of Machine Learning Research on 27 September 2023. Volume Edited by: Timothy Doster Tegan Emerson Henry Kvinge Nina Miolane Mathilde Papillon Bastian Rieck Sophia Sanborn Series Editors: Neil D. Lawrence https://proceedings.mlr.press/v221/ Mon, 05 Feb 2024 14:04:40 +0000 Mon, 05 Feb 2024 14:04:40 +0000 Jekyll v3.9.3 Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure-in particular, mapper graphs-of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/zhou23a.html https://proceedings.mlr.press/v221/zhou23a.html GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at [https://github.com/soham0209/mpml-graph](https://github.com/soham0209/mpml-graph). Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/xin23a.html https://proceedings.mlr.press/v221/xin23a.html Homological Neural Networks: A Sparse Architecture for Multivariate Complexity The rapid progress of Artificial Intelligence research came with the development of increasingly complex deep learning models, leading to growing challenges in terms of computational complexity, energy efficiency and interpretability. In this study, we apply advanced network-based information filtering techniques to design a novel deep neural network unit characterized by a sparse higher-order graphical architecture built over the homological structure of underlying data. We demonstrate its effectiveness in two application domains which are traditionally challenging for deep learning: tabular data and time series regression problems. Results demonstrate the advantages of this novel design which can tie or overcome the results of state-of-the-art machine learning and deep learning models using only a fraction of parameters. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/wang23a.html https://proceedings.mlr.press/v221/wang23a.html Unsupervised Embedding Quality Evaluation Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain. Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for self-supervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the high-dimensional geometric structure self-supervised learning. We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/tsitsulin23a.html https://proceedings.mlr.press/v221/tsitsulin23a.html ICML 2023 Topological Deep Learning Challenge: Design and Results This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two month duration. This paper describes the design of the challenge and summarizes its main findings. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/papillon23a.html https://proceedings.mlr.press/v221/papillon23a.html Fisher-Rao and pullback Hilbert cone distances on the multivariate Gaussian manifold with applications to simplification and quantization of mixtures Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor medical imaging, structure tensor computer vision, radar signal processing, machine learning, etc. In order to process those data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities and paths joining normal distributions. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information is such a principled distance which however is not known in closed-form excepts on a few particular cases. We first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between normal distributions. Second, we introduce a distance based on a diffeomorphic embedding of the Gaussian manifold into a submanifold of the higher-dimensional symmetric positive-definite cone. We show that the projective Hilbert distance on the cone is a metric on the embedded Gaussian submanifold and pullback that distance with the straight line Hilbert cone geodesics to obtain a distance and paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only extreme eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/nielsen23b.html https://proceedings.mlr.press/v221/nielsen23b.html Non-linear Embeddings in Hilbert Simplex Geometry A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream analysis. Embedding discrete hierarchical structures in hyperbolic geometry has proven very successful since it was shown that any weighted tree can be embedded in that geometry with arbitrary low distortion. Various optimization methods for hyperbolic embeddings based on common models of hyperbolic geometry have been studied. In this paper, we consider Hilbert geometry for the standard simplex which is isometric to a vector space equipped with the variation polytope norm. We study the representation power of this Hilbert simplex geometry by embedding distance matrices of graphs. Our findings demonstrate that Hilbert simplex geometry is competitive to alternative geometries such as the Poincaré hyperbolic ball or the Euclidean geometry for embedding tasks while being fast and numerically robust. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/nielsen23a.html https://proceedings.mlr.press/v221/nielsen23a.html A margin-based multiclass generalization bound via geometric complexity There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks and which measures the variability of the model function. We derive a new upper bound on the generalization error which scale with the margin-normalized geometric complexity of the network and which hold for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/munn23a.html https://proceedings.mlr.press/v221/munn23a.html Geometrically Regularized Wasserstein Dictionary Learning Wasserstein dictionary learning is an unsupervised approach to learning a collection of probability distributions that generate observed distributions as Wasserstein barycentric combinations. Existing methods solve an optimization problem that only seeks a dictionary and weights that minimize the reconstruction accuracy. However, there is no a priori reason to believe there are unique solutions in general to this problem. Moreover, the learned dictionary is, by design, optimized to represent the observed data set, and may not be useful for classification tasks or generative modeling. Just as regularization plays a key role in linear dictionary learning, we propose a geometric regularizer for Wasserstein space that promotes representations of a data distributions using nearby dictionary elements. We show that this regularizer leads to barycentric weights that concentrate on dictionary atoms local to each data distribution. When data are generated as Wasserstein barycenters of fixed distributions, this regularizer facilitates the recovery of the generating distributions in cases that are ill-posed for unregularized Wasserstein dictionary learning. Through experimentation on synthetic and real data, we show that our geometrically regularized approach yields more interpretable dictionaries in Wasserstein space which perform better in downstream applications. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/mueller23a.html https://proceedings.mlr.press/v221/mueller23a.html On genuine invariance learning without weight-tying In this paper, we investigate properties and limitations of invariance learned by neural networks from the data compared to the genuine invariance achieved through invariant weight-tying. To do so, we adopt a group theoretical perspective and analyze invariance learning in neural networks without weight-tying constraints. We demonstrate that even when a network learns to correctly classify samples on a group orbit, the underlying decision-making in such a model does not attain genuine invariance. Instead, learned invariance is strongly conditioned on the input data, rendering it unreliable if the input distribution shifts. We next demonstrate how to guide invariance learning toward genuine invariance by regularizing the invariance of a model at the training. To this end, we propose several metrics to quantify learned invariance: (i) predictive distribution invariance, (ii) logit invariance, and (iii) saliency invariance similarity. We show that the invariance learned with the invariance error regularization closely reassembles the genuine invariance of weight-tying models and reliably holds even under a severe input distribution shift. Closer analysis of the learned invariance also reveals the spectral decay phenomenon, when a network chooses to achieve the invariance to a specific transformation group by reducing the sensitivity to any input perturbation. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/moskalev23a.html https://proceedings.mlr.press/v221/moskalev23a.html Linear Regression on Manifold Structured Data: the Impact of Extrinsic Geometry on Solutions In this paper, we study linear regression applied to data structured on a manifold. We assume that the data manifold is smooth and is embedded in a Euclidean space, and our objective is to reveal the impact of the data manifold’s extrinsic geometry on the regression. Specifically, we analyze the impact of the manifold’s curvatures (or higher order nonlinearity in the parameterization when the curvatures are locally zero) on the uniqueness of the regression solution. Our findings suggest that the corresponding linear regression does not have a unique solution when the manifold is flat. Otherwise, the manifold’s curvature (or higher order nonlinearity in the embedding) may contribute significantly, particularly in the solution associated with the normal directions of the manifold. Our findings thus reveal the role of data manifold geometry in ensuring the stability of regression models for out-of-distribution inferences. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/liu23b.html https://proceedings.mlr.press/v221/liu23b.html ReLU Neural Networks, Polyhedral Decompositions, and Persistent Homology A ReLU neural network leads to a finite polyhedral decomposition of input space and a corresponding finite dual graph. We show that while this dual graph is a coarse quantization of input space, it is sufficiently robust that it can be combined with persistent homology to detect homological signals of manifolds in the input space from samples. This property holds for a wide range of networks trained for a wide range of purposes that have nothing to do with this topological application. We found this feature to be surprising and interesting; we hope it will also be useful. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/liu23a.html https://proceedings.mlr.press/v221/liu23a.html Learned Gridification for Efficient Point Cloud Processing Neural operations that rely on neighborhood information are much more expensive when deployed on point clouds than on grid data due to the irregular distances between points in a point cloud. For example, in a convolution, the convolutional kernel must be recomputed for every point in a point cloud to consider the distances to all other points in its neighbourhood. In a grid, on the other hand, we can compute the kernel only once and reuse it for all query positions. As a result, operations that rely on neighborhood information scale much worse for point clouds than for grid data, specially for large inputs and large neighborhoods. In this work, we address the scalability issue of point cloud methods by tackling its root cause: the irregularity of the data. To this end, we propose learnable gridification as the first step in a point cloud processing pipeline to transform the point cloud into a compact, regular grid. Thanks to gridification, subsequent layers can use operations defined on regular grids, e.g., Conv3D, which scale much better than native point cloud methods. We then extend gridification to point cloud to point cloud tasks, e.g., segmentation, by adding a earnable de-gridification step at the end of the point cloud processing pipeline to map the compact, regular grid back to its original point cloud form. Through theoretical and empirical analysis, we show that gridified networks scale better in terms of memory and time than networks directly applied on raw point cloud data, while being able to achieve competitive results. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/linden23a.html https://proceedings.mlr.press/v221/linden23a.html On Explicit Curvature Regularization in Deep Generative Models We propose a family of curvature-based regularization terms for deep generative model learning. Explicit coordinate-invariant formulas for both intrinsic and extrinsic curvature measures are derived for the case of arbitrary data manifolds embedded in higher-dimensional Euclidean space. Because computing the curvature is a highly computation-intensive process involving the evaluation of second-order derivatives, efficient formulas are derived for approximately evaluating intrinsic and extrinsic curvatures. Comparative studies are conducted that compare the relative efficacy of intrinsic versus extrinsic curvature-based regularization measures, as well as performance comparisons against existing autoencoder training methods. Experiments involving noisy motion capture data confirm that curvature-based methods outperform existing autoencoder regularization methods, with intrinsic curvature measures slightly more effective than extrinsic curvature measures. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/lee23a.html https://proceedings.mlr.press/v221/lee23a.html Deep Networks as Paths on the Manifold of Neural Representations Deep neural networks implement a sequence of layer-by-layer operations that are each relatively easy to understand, but the resulting overall computation is generally difficult to understand. An intuitive hypothesis is that the role of each layer is to reformat information to reduce the "distance" to the desired outputs. With this spatial analogy, the layer-wise computation implemented by a deep neural network can be viewed as a path along a high-dimensional manifold of neural representations. With this framework, each hidden layer transforms its inputs by taking a step of a particular size and direction along the manifold, ideally moving towards the desired network outputs. We formalize this intuitive idea by leveraging recent advances in _metric_ representational similarity. We extend existing representational distance methods by defining and characterizing the _manifold_ that neural representations live on, allowing us to calculate quantities like the shortest path or tangent direction separating representations between hidden layers of a network or across different networks. We then demonstrate these tools by visualizing and comparing the paths taken by a collection of trained neural networks with a variety of architectures, finding systematic relationships between model depth and width, and properties of their paths. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/lange23a.html https://proceedings.mlr.press/v221/lange23a.html One-Shot Neural Network Pruning via Spectral Graph Sparsification Neural network pruning has gained significant attention for its potential to reduce computational resources required for training and inference. A large body of research has shown that networks can be pruned both after training and at initialisation, while maintaining competitive accuracy compared to dense networks. However, current methods rely on iteratively pruning or repairing the network to avoid over-pruning and layer collapse. Recent work has found that by treating neural networks as a sequence of bipartite graphs, pruning can be studied through the lens of spectral graph theory. Therefore, in this work, we propose a novel pruning approach using spectral sparsification, which aims to preserve meaningful properties of a dense graph with a sparse subgraph, by preserving the spectrum of the dense graph’s adjacency matrix. We empirically validate and investigate our method, and show that one-shot pruning using spectral sparsification preserves performance at higher levels of sparsity compared to its one-shot counterparts. Additionally, we theoretically analyse our method with respect to local and global connectivity. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/laenen23a.html https://proceedings.mlr.press/v221/laenen23a.html MASIL: Towards Maximum Separable Class Representation for Few Shot Class Incremental Learning Few Shot Class Incremental Learning (FSCIL) with few examples per class for each incremental session is the realistic setting of continual learning since obtaining large number of annotated samples is not feasible and cost effective. We present the framework MASIL as a step towards learning the maximal separable classifier. It addresses the common problem i.e forgetting of old classes and over-fitting to novel classes by learning the classifier weights to be maximally separable between classes forming a simplex Equiangular Tight Frame. We propose the idea of concept factorization explaining the collapsed features for base session classes in terms of concept basis and use these to induce classifier simplex for few shot classes. We further adds fine tuning to reduce any error occurred during factorization and train the classifier jointly on base and novel classes without retaining any base class samples in memory. Experimental results on miniImageNet, CIFAR-100 and CUB-200 demonstrate that MASIL outperforms all the benchmarks. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/khandelwal23a.html https://proceedings.mlr.press/v221/khandelwal23a.html Bridging Equational Properties and Patterns on Graphs: an AI-Based Approach AI-assisted solutions have recently proven successful when applied to Mathematics and have opened new possibilities for exploring unsolved problems that have eluded traditional approaches for years or even centuries. Following this direction, this paper presents an innovative approach aiming at establishing correlations between equational properties of algebraic structures that can be represented through graphs and specific sub-portions of their topological representation. The methodology incorporates the utilization of graph neural architectures to validate theorems or conjectures, complemented by Explainability (XAI) metrics that lend support to these statements. In particular, we examine the distributive and modular properties of algebraic lattices, whose characterization is well-known in universal algebra, hence using these properties as an experimental test bench. The findings of this study demonstrate the effectiveness of the proposed approach in identifying and retrieving established subpatterns that characterize the equational properties under investigation. Moreover, the approach exhibits the capability to generate novel and noteworthy candidates as theorem suggesters, thereby offering valuable prospects for further exploration by mathematicians. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/keskin23a.html https://proceedings.mlr.press/v221/keskin23a.html Episodic Memory Theory of Recurrent Neural Networks: Insights into Long-Term Information Storage and Manipulation Recurrent neural networks (RNNs) have emerged as powerful models capable of storing and manipulating external information over long periods in various domains. Yet, the mechanisms that underly this behavior remain a mystery due to the black-box nature of these models. This paper addresses this question by proposing an episodic memory theory of RNN dynamics, enabling a more comprehensive understanding of the RNN weights as memories and inter-memory interactions. This approach sheds light on the inner workings of RNNs and connects to existing research on memory representation and organization. The theory extends the current linearization approaches by providing alternative interpretations of the eigenspectrum and its connection to the long-term storage and manipulation of information. We discuss how the segregation, representation, and composition of the variable binding problem—a fundamental question in cognitive science and artificial intelligence—can be mechanistically interpreted within the theory. Using an elementary task - repeat copy, we demonstrate the validity of the theory in experimental settings. Our work represents a step towards opening the black box of RNNs, offering new insights into their functionality and bridging the gap between recurrent neural networks and memory models. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/karuvally23a.html https://proceedings.mlr.press/v221/karuvally23a.html Explaining Graph Neural Networks Using Interpretable Local Surrogates We propose an interpretable local surrogate (ILS) method for understanding the predictions of black-box graph models. Explainability methods are commonly employed to gain insights into black-box models and, given the widespread adoption of GNNs in diverse applications, understanding the underlying reasoning behind their decision-making processes becomes crucial. Our ILS method approximates the behavior of a black-box graph model by fitting a simple surrogate model in the local neighborhood of a given input example. Leveraging the interpretability of the surrogate, ILS is able to identify the most relevant nodes contributing to a specific prediction. To efficiently identify these nodes, we utilize group sparse linear models as local surrogates. Through empirical evaluations on explainability benchmarks, our method consistently outperforms state-of-the-art graph explainability methods. This demonstrates the effectiveness of our approach in providing enhanced interpretability for GNN predictions. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/heidari23a.html https://proceedings.mlr.press/v221/heidari23a.html Product Manifold Learning with Independent Coordinate Selection In many dimensionality reduction tasks, we wish to identify the constituent components that explain our observations. For manifold learning, this can be formalized as factoring a Riemannian product manifold. Recovering this factorization, however, may suffer from certain difficulties in practice, especially when data is sparse or noisy, or when one factor is distorted by the other. To address these limitations, we propose identifying non-redundant coordinates on the product manifold before applying product manifold learning to identify which coordinates correspond to different factor manifolds. We demonstrate our approach on both synthetic and real-world data. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/he23a.html https://proceedings.mlr.press/v221/he23a.html Learning To See Topological Properties In 4D Using Convolutional Neural Networks Topology describes the essential structure of a space, and in 4D, a larger variety of topologically distinct manifolds can be embedded versus 2D or 3D. The present study investigates an end-to-end visual approach, which couples data generation software and convolutional neural networks (CNNs) to estimate the topology of 4D data. A synthetic 4D training data set is generated with the use of several manifolds, and then labelled with their associated Betti numbers by using techniques from algebraic topology. Several approaches to implementing a 4D convolution layer are compared. Experiments demonstrate that already a basic CNN can be trained to provide estimates for the Betti numbers associated with the number of one-, two-, and three-dimensional holes in the data. Some of the intricacies of topological data analysis in the 4D setting are also put on view, including aspects of persistent homology. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/hannouch23a.html https://proceedings.mlr.press/v221/hannouch23a.html Learning Lie Group Symmetry Transformations with Neural Networks The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterising unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterise the transformation group and the distribution of the parameter values, even when they aren’t small or the transformation group isn’t one of the traditional ones. The results showcase the effectiveness of the approach in both these settings. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/gabel23a.html https://proceedings.mlr.press/v221/gabel23a.html Can strong structural encoding reduce the importance of Message Passing? The most prevalent class of neural networks operating on graphs are message passing neural networks (MPNNs), in which the representation of a node is updated iteratively by aggregating information in the 1-hop neighborhood. Since this paradigm for computing node embeddings may prevent the model from learning coarse topological structures, the initial features are often augmented with structural information of the graph, typically in the form of Laplacian eigenvectors or Random Walk transition probabilities. In this work, we explore the contribution of message passing when strong structural encodings are provided. We introduce a novel way of modeling the interaction between feature and structural information based on their tensor product rather than the standard concatenation. The choice of interaction is compared in common scenarios and in settings where the capacity of the message-passing layer is severely reduced and ultimately the message-passing phase is removed altogether. Our results indicate that using tensor-based encodings is always at least on par with the concatenation-based encoding and that it makes the model much more robust when the message passing layers are removed, on some tasks incurring almost no drop in performance. This suggests that the importance of message passing is limited when the model can construct strong structural encodings. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/eijkelboom23a.html https://proceedings.mlr.press/v221/eijkelboom23a.html Preface The 2nd ICML Workshop on Topology, Algebra, and Geometry in Machine Learning is an exercise in bringing together researchers working in both of the above threads to exchange ideas, present recent work, and form new collaborations. This proceedings collection captures some of the rich flow of ideas that happened in the workshop. It is also a testament to the breadth of ways that mathematics is currently being applied to modern ML, from the topology of models and datasets to equivariance of models to group actions to applications of hyperbolic geometry to learning problems. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/doster23a.html https://proceedings.mlr.press/v221/doster23a.html Topologically Attributed Graphs for Shape Discrimination In this paper we introduce a novel family of attributed graphs for the purpose of shape discrimination. Our graphs typically arise from variations on the Mapper graph construction, which is an an approximation of the Reeb graph for point cloud data. Our attributions enrich these constructions with (persistent) homology in ways that are provably stable, thereby recording extra topological information that is typically lost in these graph constructions. We provide experiments which illustrate the use of these invariants for shape representation and classification. In particular, we obtain competitive shape classification results when using our topologically attributed graphs as inputs to a simple graph neural network classifier. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/curry23a.html https://proceedings.mlr.press/v221/curry23a.html The Weisfeiler-Lehman Distance: Reinterpretation and Connection with GNNs In this paper, we present a novel interpretation of the Weisfeiler-Lehman (WL) distance introduced by \cite{chen2022weisfeilerlehman} using concepts from stochastic processes. The WL distance aims compares graphs with node features, has the same discriminative power as the classic Weisfeiler-Lehman graph isomorphism test and has deep connections to the Gromov-Wasserstein distance. Our interpretation connects the WL distance to the literature on distances for stochastic processes, which also makes the interpretation of the distance more accessible and intuitive. We further explore the connections between the WL distance and certain Message Passing Neural Networks, and discuss the implications of the WL distance for understanding the Lipschitz property and the universal approximation results for these networks. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/chen23a.html https://proceedings.mlr.press/v221/chen23a.html Equivariant Self-supervised Deep Pose Estimation for Cryo EM Reconstructing the 3D volume of a molecule from its differently oriented 2D projections is the central problem of cryo-EM, one of the main techniques for macro-molecule imaging. Because the orientations are unknown, the estimation of the images’ poses is essential to solve this inverse problem. Typical methods either rely on *synchronization*, which leverages the estimated relative poses of the images to constrain their absolute ones, or *jointly estimate* the poses and the 3D density of the molecule in an iterative fashion. Unfortunately, synchronization methods don’t account for the complete images’ generative process and, therefore, achieve lower noise robustness. In the second case, the iterative joint optimization suffers from convergence issues and a higher computational cost, due to the 3D reconstruction steps. In this work, we directly estimate individual poses with equivariant deep graph networks trained using a self-supervised loss, which enforces agreement in Fourier domain of images pairs along the *common lines* defined by their poses. In particular, the *equivariant* design turns out essential for the proper convergence. As a result, our method can leverage the synchronization constraints - encoded by the synchronization graph structure - to improve convergence as well as the images generative process - via the common lines loss -, with no need to perform intermediate reconstructions. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/cesa23a.html https://proceedings.mlr.press/v221/cesa23a.html Topological Feature Selection In this paper, we introduce a novel unsupervised, graph-based filter feature selection technique which exploits the power of topologically constrained network representations. We model dependency structures among features using a family of chordal graphs (i.e. the Triangulated Maximally Filtered Graph), and we maximise the likelihood of features’ relevance by studying their relative position inside the network. Such an approach presents three aspects that are particularly satisfactory compared to its alternatives: (i) it is highly tunable and easily adaptable to the nature of input data; (ii) it is fully explainable, maintaining, at the same time, a remarkable level of simplicity; (iii) it is computationally cheap. We test our algorithm on $16$ benchmark datasets from different application domains showing that it outperforms or matches the current state-of-the-art under heterogeneous evaluation conditions. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/briola23a.html https://proceedings.mlr.press/v221/briola23a.html Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections Hyperbolic space embeddings have been shown beneficial for many learning tasks where data have an underlying hierarchical structure. Consequently, many machine learning tools were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riemannian manifolds and enjoy strong theoretical properties, but suffer from high computational cost. On Euclidean spaces, sliced-Wasserstein distances, which leverage a closed-form solution of the Wasserstein distance in one dimension, are more computationally efficient, but are not readily available on hyperbolic spaces. In this work, we propose to derive novel hyperbolic sliced-Wasserstein discrepancies. These constructions use projections on the underlying geodesics either along horospheres or geodesics. We study and compare them on different tasks where hyperbolic representations are relevant, such as sampling or image classification. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/bonet23a.html https://proceedings.mlr.press/v221/bonet23a.html Breaking the Structure of Multilayer Perceptrons with Complex Topologies Recent advances in neural network (NN) architectures have demonstrated that complex topologies possess the potential to surpass the performance of conventional feedforward networks. Nonetheless, previous studies investigating the relationship between network topology and model performance have yielded inconsistent results, complicating their applicability in contexts beyond those scrutinized. In this study, we examine the utility of directed acyclic graphs (DAGs) for modeling intricate relationships among neurons within NNs. We introduce a novel algorithm for the efficient training of DAG-based networks and assess their performance relative to multilayer perceptrons (MLPs). Through experimentation on synthetic datasets featuring varying levels of difficulty and noise, we observe that complex networks founded on pertinent graphs outperform MLPs in terms of accuracy, particularly within high-difficulty scenarios. Additionally, we explore the theoretical underpinnings of these observations and explore the potential trade-offs associated with employing complex networks. Our research offers valuable insights into the capabilities and constraints of complex NN architectures, thus contributing to the ongoing pursuit of designing more potent and efficient deep learning models. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/boccato23a.html https://proceedings.mlr.press/v221/boccato23a.html An ML approach to resolution of singularities The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/berczi23a.html https://proceedings.mlr.press/v221/berczi23a.html An Exact Kernel Equivalence for Finite Classification Models We explore the equivalence between neural networks and kernel methods by deriving the first exact representation of any finite-size parametric classification model trained with gradient descent as a kernel machine. We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK and other non-exact path kernel formulations. We experimentally demonstrate that the kernel can be computed for realistic networks up to machine precision. We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks, particularly the way in which they generalize. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/bell23a.html https://proceedings.mlr.press/v221/bell23a.html A Geometric Insight into Equivariant Message Passing Neural Networks on Riemannian Manifolds This work proposes a geometric insight into equivariant message passing on Riemannian manifolds. As previously proposed, numerical features on Riemannian manifolds are represented as coordinate-independent feature fields on the manifold. To any coordinate-independent feature field on a manifold comes attached an equivariant embedding of the principal bundle to the space of numerical features. We argue that the metric this embedding induces on the numerical feature space should optimally preserve the principal bundle’s original metric. This optimality criterion leads to the minimization of a twisted form of the Polyakov action with respect to the graph of this embedding, yielding an equivariant diffusion process on the associated vector bundle. We obtain a message passing scheme on the manifold by discretizing the diffusion equation flow for a fixed time step. We propose a higher- order equivariant diffusion process equivalent to diffusion on the cartesian product of the base manifold. The discretization of the higher-order diffusion process on a graph yields a new general class of equivariant GNN, generalizing the ACE and MACE formalism to data on Riemannian manifolds. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/batatia23a.html https://proceedings.mlr.press/v221/batatia23a.html Metric Space Magnitude and Generalisation in Neural Networks Deep learning models have seen significant successes in numerous applications, but their inner workings remain elusive. The purpose of this work is to quantify the learning process of deep neural networks through the lens of a novel topological invariant called magnitude. Magnitude is an isometry invariant; its properties are an active area of research as it encodes many known invariants of a metric space. We use magnitude to study the internal representations of neural networks and propose a new method for determining their generalisation capabilities. Moreover, we theoretically connect magnitude dimension and the generalisation error, and demonstrate experimentally that the proposed framework can be a good indicator of the latter. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/andreeva23a.html https://proceedings.mlr.press/v221/andreeva23a.html Sumformer: Universal Approximation for Efficient Transformers Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/alberti23a.html https://proceedings.mlr.press/v221/alberti23a.html Global and Relative Topological Features from Homological Invariants of Subsampled Datasets Homology-based invariants can be used to characterize the geometry of datasets and thereby gain some understanding of the processes generating those datasets. In this work we investigate how the geometry of a dataset changes when it is subsampled in various ways. In our framework the dataset serves as a reference object; we then consider different points in the ambient space and endow them with a geometry defined in relation to the reference object, for instance by subsampling the dataset proportionally to the distance between its elements and the point under consideration. We illustrate how this process can be used to extract rich geometrical information, allowing for example to classify points coming from different data distributions. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/agerberg23a.html https://proceedings.mlr.press/v221/agerberg23a.html FAM: Relative Flatness Aware Minimization Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more recent successful sharpness-aware optimization techniques. Their widespread adoption in practice, though, is dubious because of the lack of theoretically grounded connection between flatness and generalization, in particular in light of the reparameterization curse—certain reparameterizations of a neural network change most flatness measures but do not change generalization. Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization and solves the reparameterization curse. In this paper, we derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions. It requires computing the Hessian only of a single layer of the network, which makes it applicable to large neural networks, and with it avoids an expensive mapping of the loss surface in the vicinity of the model. In an extensive empirical evaluation we show that this relative flatness aware minimization (FAM) improves generalization in a multitude of applications and models, both in finetuning and standard training. We make the code available at github. Wed, 27 Sep 2023 00:00:00 +0000 https://proceedings.mlr.press/v221/adilova23a.html https://proceedings.mlr.press/v221/adilova23a.html