[2605.06395] Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Consistent Geometric Deep Learning
via Hilbert Bundles and Cellular Sheaves

Kartik Tandon Julian Gould Tanishq Bhatia [0.3em] Francesca Dominici Alejandro Ribeiro Claudio Battiloro [0.5em] University of Pennsylvania Sakana AI Northeastern University Harvard University [0.3em] Equal contribution [0.1em] Corresponding authors: ktandon@sas.upenn.edu, cbattiloro@hsph.harvard.edu

Abstract

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as HilbNets. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin & Niyogi [14] convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

1 Introduction

Over the past few years, advances in deep learning have delivered state-of-the-art performance across many areas, driven by increasingly expressive architectures and corresponding gains in both theory and practice. A major contributor to this success, though not the only one, has been the rise of Convolutional Neural Networks (CNNs) [63]. CNNs have shown outstanding results in settings ranging from image recognition [60] to speech processing [1]. At their core, CNNs rely on filters leveraging the regular (often metric) organization of common signal types, such as spatial grids. In contrast, many modern datasets live on irregular, non-Euclidean domains, including social networks for detection and recommendation [2] or point clouds for shape segmentation [107], to name only a few. Such structured data can be represented by richer mathematical objects, among which networks and manifolds are prominent. Motivated by this, the intuition behind CNNs has been generalized to graph convolutional neural networks (GCNs) [86, 40, 59] and extended to many other settings, e.g. simplicial complexes [10, 17, 6], cell complexes [43, 16, 77], order lattices [84], and manifolds [102, 34, 87, 11, 27]. Nevertheless, existing works do not address convolutional filtering of infinite-dimensional signals on manifolds, despite such data being ubiquitous in practice, from time series and spatiotemporal fields arising in sensing, robotics, and climate science to distributional and measure-valued representations common in modern learning systems [54].

To address this gap, we adopt a bundle viewpoint. Informally, a bundle $\mathcal{E}$ over a base manifold $\mathcal{M}$ is a consistent assignment to each point $x\in\mathcal{M}$ of a space $\mathcal{E}_{x}$ , called a fiber. A section is a map that picks an element $S(x)\in\mathcal{E}_{x}$ at every point. In other words, signals supported on manifolds can be seen as sections, e.g., scalar manifold signals correspond to $\mathcal{E}_{x}\simeq\mathbb{R}$ [102], or tangent bundle signals correspond to $\mathcal{E}_{x}\simeq T_{x}\mathcal{M}$ [11]. In this work, we develop a convolutional learning framework operating over Hilbert bundles, i.e., bundles whose fibers are (possibly) infinite-dimensional Hilbert spaces. We design a bundle-theoretic convolutional learning framework and, to make it implementable, we draw the first rigorous connection between Hilbert bundles and Hilbert Cellular Sheaves, generalized graph structures whose nodes and edges carry infinite-dimensional signals along with consistency rules.

Refer to caption — Figure 1: Overview of the HilbNets framework. A HilbNet is a convolutional neural network processing infinite-dimensional signals supported on $\mathcal{M}$ (e.g., time-series or distributions over curved domains). The convolutional operator is the connection Laplacian $\Delta_{\nabla}$ . To make HilbNets implementable, we first sample $n$ points $\mathcal{X}_{n}$ from $\mathcal{M}$ to obtain a Hilbert Cellular Sheaf $\mathcal{F}_{n}$ with associated Hilbert Sheaf Laplacian $\Delta_{\mathcal{F}_{n}}$ . We then take $d$ samples of the signals to get vectors $\mathbf{s}_{n,d}\in\mathbb{R}^{nd}$ living on a network sheaf $\mathcal{F}_{\mathcal{X}_{n,d}}$ , i.e., a generalized matrix-weighted graph, with associated network Sheaf Laplacian $\Delta_{\mathcal{F}_{n,d}}\in\mathbb{R}^{nd\times nd}$ . Discretized HilbNets are then Sheaf Neural Networks that take as inputs $\mathbf{S}_{n,d}$ and $\Delta_{\mathcal{F}_{n,d}}$ . We prove that Discretized HilbNets converge to the underlying HilbNet as the number of manifold and signal samples goes to infinity.

Related Works. The connection between continuous domains, such as manifolds and bundles, and discrete structures, such as graphs and cellular sheaves, first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional Riemannian manifolds [38]. The manifold hypothesis underpins several modern spectral graph methods, e.g., nonlinear dimensionality-reduction, clustering, interpretability, and learning algorithms that exploit latent geometric structures. The renowned work of Belkin and Niyogi [14] proved that, assuming access to a finite point cloud sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.

The work in [14], and related results, e.g., [92, 93], have been used, directly or indirectly, to design learning systems over manifolds and networks [103, 67, 20, 74, 11, 56]. Despite the diversity of such systems, these models all assume finite-dimensional fibers and therefore do not directly address learning with infinite-dimensional manifold signals. The main technical reason behind this gap is the lack of an extension of the convergence result in [14] to bundles with infinite-dimensional fibers.

A line of related works of interest comes from cellular sheaf theory. Cellular sheaves are combinatorial instances of sheaves introduced in [90] and later rediscovered in [29]. In [18, 50, 7, 39, 37, 78], neural networks operating on finite-dimensional cellular sheaves over graphs, referred to as network sheaves, are presented, generalizing graph neural networks by, intuitively, replacing scalar edge weights with learned or structured matrix weights. Recently, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on network sheaves built from manifold samples. For an extended treatment of related work, see Appendix A.

Contribution. In this work, we first define a convolution operation over a Hilbert bundle through its associated connection Laplacian. This convolution extends Laplacian-based convolutions on tangent bundles [11], manifolds [103], and graphs [91, 41], as well as standard time convolutions. Using the Borel functional calculus, we then define Hilbert bundle convolutional filters for infinite-dimensional manifold signals. These filters are general and expressive, and can be instantiated through suitable spectral responses. We then introduce HilbNets, deep convolutional architectures whose layers stack Hilbert bundle filters and pointwise nonlinearities. HilbNets are continuous models and are therefore not directly implementable. To address this, we provide a principled discretization of the manifold domain by sampling points and showing that the induced structure is a Hilbert cellular sheaf over an undirected graph. The corresponding sheaf Laplacian combines scalar edge weights, obtained from the sampled base manifold, with parallel transport maps associated with the bundle geometry or learned from data. We prove that this sheaf Laplacian converges in probability to the connection Laplacian as the sampling density increases, yielding the first extension of the classical convergence result of [14] to the infinite-dimensional bundle setting. We then discretize the signals themselves to obtain an implementable architecture, show that discretized HilbNets are novel instances of network sheaf neural networks, and prove that they converge to the corresponding continuous architectures as both the manifold and signal sampling densities increase. Moreover, we show that discretized HilbNets are transferable across different samplings of the same underlying bundle, providing resolution consistency guarantees for learning. Finally, we validate HilbNets on a synthetic transport recovery task and on real-world traffic forecasting tasks, comparing them against baselines with different inductive biases in order to isolate the benefits of the bundle formulation. The potential impact of this work extends well beyond the definition of the HilbNet architecture. See Appendix B for a detailed discussion of broader impact and future directions, and Fig. 1 for an overview.

2 Preliminaries

Signals on Manifolds. Given a manifold $\mathcal{M}$ , a vector-valued signal is a square-integrable function $S\in L^{2}(\mathcal{M},\mathbb{R}^{n})$ . Certain vector-valued signals on $\mathcal{M}$ may possess the richer structure of a vector field, i.e., they are sections of the tangent bundle $T\mathcal{M}$ of $\mathcal{M}$ and thus elements of $L^{2}(\mathcal{M},T\mathcal{M})$ . More generally, we may consider signals that are $L^{2}$ -sections of an arbitrary bundle $\mathcal{E}$ . A bundle is called trivial when, for a generic fiber $\mathcal{V}$ , it can be written as a product $\mathcal{E}=\mathcal{M}\times\mathcal{V}$ . In this setting, $L^{2}(\mathcal{M},\mathbb{R}^{n})$ may be understood as the space of sections of the trivial bundle $\mathcal{E}=\mathcal{M}\times\mathbb{R}^{n}$ .

Consider now the case where the signal is ‘infinite-dimensional’, for instance, representing a time series recorded at each point $x\in\mathcal{M}$ . While this is usually considered as a function $S:\mathcal{M}\times\mathbb{R}\to\mathbb{R}^{n}$ , it may instead be more richly understood as a section of a Hilbert bundle, i.e., a bundle whose fibers are Hilbert spaces. As we will see, Hilbert bundles provide a principled and versatile approach to incorporating structural properties of infinite-dimensional data.

Example 1.

In physics, Hilbert bundles often arise naturally when considering global geometric properties of quantum mechanical systems [4].

Example 2.

In information geometry, the key objects of study are manifolds $\mathcal{M}$ given by the underlying parameters of some family of data distributions. This manifold is then equipped with a Riemannian structure by either the Otto-Wasserstein or Fisher-Rao metric, the latter of which locally recovers KL divergence. The proper analogue of the tangent bundle in this setting is a Hilbert bundle [72].

Convolution, Heat Equation, and Connection Laplacian. Geometric signal processing and deep learning [66, 21] traditionally aim to develop convolutional filters and neural networks designed to respect the underlying geometry of the signals of interest. The relevant convolutional operators can usually be realized as a connection Laplacian $\Delta_{\nabla}:L^{2}(\mathcal{M},\mathcal{E})\to L^{2}(\mathcal{M},\mathcal{E})$ operator realized from a connection $\nabla$ . For instance, for the tangent bundle over the circle $T\mathbb{S}^{1}$ , the eigenfunctions of $\Delta_{\nabla}$ , with $\nabla$ the Levi-Civita connection, recover the usual Fourier basis. Similarly, the eigenfunctions of $\Delta_{\nabla}$ for the tangent bundle of the sphere $T\mathbb{S}^{2}$ recover spherical harmonics. Thus, convolutions with the connection Laplacian may be understood as generalized Fourier transforms in the spectral domain. In the spatial domain, it can be seen as performing a geometry-aware ‘local averaging’ of a signal over fibers. Formally, the connection Laplacian is the generator of the heat equation in $L^{2}(\mathcal{M},\mathcal{E})$ ,

\frac{\partial\mathbf{U}(x,t)}{\partial t}=-\Delta_{\nabla}\mathbf{U}(x,t),

(1)

where $\mathbf{U}(x,t)$ is the distribution of heat at $\mathcal{E}_{x}$ for $x\in\mathcal{M}$ at time $t\in\mathbb{R}_{+}$ . A subtlety of note is that for non-Euclidean spaces, there is typically no canonical identification between fibers $\mathcal{E}_{x}$ and $\mathcal{E}_{y}$ . Intuitively, the connection $\nabla$ precisely encodes a globally coherent notion of transport between fibers. That is, inducing parallel transport maps $P_{\gamma}:\mathcal{E}_{\gamma(0)}\to\mathcal{E}_{\gamma(1)}$ for a path $\gamma\subset\mathcal{M}$ , allowing us to compare elements across fibers along this path. More formally, the connection is used to define a first-order ODE whose solution is given by parallel transport (see Appendix G.1.3). The connection Laplacian is then the self-adjoint operator $\Delta_{\nabla}:=\nabla^{*}\nabla$ , now more clearly understandable as a ‘local weighted average’ over fibers with ‘weights’ corresponding to our choice of parallel transport. A more rigorous introduction to the relevant mathematical background is provided in Appendix G.

Example 3.

As seen in Fig. 2, the choice of connection can be used to emphasize aspects of the geometry that may be relevant for a particular task. For instance, most PDE-based approaches to color-image regularization can be realized as heat equations for a suitable choice of connection [8].

3 Hilbert Bundle Filters and Neural Networks

In this section, we develop a convolutional learning framework for infinite-dimensional data, such as time series or probability distributions, indexed by a manifold $\mathcal{M}$ . Core objects are Hilbert bundles.

Hilbert Bundles. Given a closed Riemannian manifold $\mathcal{M}$ , a Hilbert bundle $\mathcal{E}$ over $\mathcal{M}$ is a bundle whose potentially infinite-dimensional fibers are separable Hilbert spaces over $\mathbb{R}$ . The assumption of real, instead of complex, Hilbert spaces is not essential to our analysis, and is made only for the sake of exposition. As mentioned in Section 2, a Hilbert Bundle signal is then an $L^{2}$ -section $S:\mathcal{M}\to\mathcal{E}$ . Integration of sections in this setting should be understood in the Bochner integral sense, a generalized notion of integration for functions whose values lie in a Hilbert space rather than in $\mathbb{R}^{n}$ . In finite dimensions, it reduces to the standard Lebesgue integral. Given fibers $\mathcal{H}_{x}$ and $\mathcal{H}_{y}$ of the Hilbert bundle $\mathcal{E}$ , we consider unitary parallel transport operators $P_{x\to y}:\mathcal{H}_{x}\to\mathcal{H}_{y}$ . As before, a globally compatible collection of such transport operators determines a connection $\nabla$ , with the subtlety that derivatives of sections must now be understood in the Fréchet sense. Intuitively, Fréchet differentiability is the infinite-dimensional analogue of ordinary differentiability: it asks that a section admit a best linear approximation under small perturbations, but where the linear approximation acts between Hilbert spaces. We therefore refer to this construction as a Fréchet connection, which recovers the usual notion of connection and covariant derivative when restricted to finite-dimensional bundles. As before, we obtain a self-adjoint operator $\Delta_{\nabla}:=\nabla^{*}\nabla$ on $L^{2}(\mathcal{M},\mathcal{E})$ . Unlike the finite-dimensional case, however, this operator need not be compact and thus need not possess a discrete spectrum. As such, care must be taken when adapting classical arguments that involve spectral properties of the Laplacian to the Hilbert-bundle setting. Formal definitions of Hilbert bundles and Fréchet connections are provided in Appendix G. For a triple $(\mathcal{M},\mathcal{E},\nabla)$ , where $\nabla$ is a choice of Fréchet connection on the Hilbert bundle $\mathcal{E}$ over the manifold $\mathcal{M}$ , we now wish to construct a general notion of a ‘filtering’ operation using the connection Laplacian $\Delta_{\nabla}$ . In finite-dimensional or compact settings, filters are often defined by applying a function directly to the eigenvalues of the Laplacian. In our setting, the appropriate analogue of eigenvalue-by-eigenvalue filtering is furnished by the Borel functional calculus, which allows us to apply a filter to a self-adjoint operator by instead integrating over its spectral measure. See Appendix G.4 for details.

Definition 1 (Hilbert bundle convolutional filter).

A convolutional filter is specified by a bounded compactly supported Borel function $g\in L_{c}^{\infty}(\mathbb{R})$ . The filtering of a signal $S\in L^{2}(\mathcal{M},\mathcal{E})$ is then its convolution $\star_{\Delta_{\nabla}}$ with $g$ defined as $g\star_{\Delta_{\nabla}}S:=g(\Delta_{\nabla})S,\textrm{ where }g(\Delta_{\nabla}):L^{2}(\mathcal{M},\mathcal{E})\to L^{2}(\mathcal{M},\mathcal{E})$ is the bounded linear operator obtained by applying $g$ to $\Delta_{\nabla}$ through the Borel functional calculus.

In this sense, $g$ is the learnable frequency response, as in spectral graph neural filters [41], except we now use the spectral measure of the connection Laplacian acting on Hilbert bundle-valued signals.

Definition 2 (Hilbert bundle convolutional neural network).

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle. A Hilbert bundle convolutional neural network, or HilbNet, is specified by a filter bank $\mathcal{W}=\{g^{\ell}_{u,q}\}_{\ell,u,q}$ with $g^{\ell}_{u,q}\in L_{c}^{\infty}(\mathbb{R})$ , and a Lipschitz continuous nonlinear activation $\sigma:\mathbb{R}\to\mathbb{R}$ . Given input signals $S_{1},...,S_{F_{0}}\in L^{2}(\mathcal{M},\mathcal{E})$ , the $L$ -layer network output is obtained by the recursion

S^{\ell+1}_{u}=\sigma\left(\sum_{q=1}^{F_{\ell}}g^{\ell}_{u,q}(\Delta_{\nabla})S^{\ell}_{q}\right),\qquad\ell=0,\dots,L-1,\qquad S^{0}_{q}=S_{q},

(2)

where $\sigma$ is applied pointwise in each fiber.

We concisely denote a HilbNet with $\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma)$ . Similarly to the finite-dimensional case, a nonlinear activation $\sigma:\mathbb{R}\to\mathbb{R}$ with $\sigma(0)=0$ extends to an operator on the $L^{2}$ section by simply picking a basis and then applying $\sigma$ to each coordinate with respect to the chosen basis. It is straightforward to check that, for each $\ell\in\{0,\ldots,L\}$ , the layer signal $S^{\ell}$ remains an $L^{2}$ section.

4 Discretized HilbNets via Cellular Sheaves

HilbNets are continuous architectures that cannot be implemented directly in practice. Moreover, we typically do not have access to the true bundle and connection structure, but only to a point cloud or graph sampled from the underlying manifold $\mathcal{M}$ , together with samples of the signal at each point or node. In this section, we first analyze the Hilbert cellular sheaf induced by spatial, i.e., manifold-level, sampling. We then further discretize the fibers, i.e., the signal domain itself, obtaining a finite rank network sheaf. This two-stage sampling is the basis of our consistency theory presented in Section 5. It allows us to prove that the fully discrete (thus, implementable) Laplacian and HilbNet converge to their infinite-dimensional counterparts, and hence that learning is consistent across scales.

Manifold Sampling. A generalized viewpoint to the theory of bundles is given by the language of sheaves, mathematical structures initially introduced by Jean Leray while a prisoner of war [65]. The functoriality of sheaves lends them particularly well to the type of principled discretization of geometric structures that we are interested in. In particular, we consider a Hilbert-space valued version of cellular sheaves on graphs, as introduced in [44]. Intuitively, they can be understood as generalized graph structures with signals valued in Hilbert spaces along with edge-wise coupling rules. For a more thorough introduction to cellular sheaves, see Appendix G.5.

In this work, our primary interest is in Hilbert sheaves that represent discretizations of the structure of a Hilbert bundle over a manifold. In particular, we desire a spatial discretization such that we can recover an appropriate discrete analogueof the Hilbert bundle’s connection Laplacian. Formally, given an iid random sample $\mathcal{X}_{n}\subset\mathcal{M}$ from the uniform distribution (see Def. 23), we have the following.

Definition 3 (Hilbert Cellular Sheaf from a Hilbert Bundle).

For a given Hilbert bundle $(\mathcal{M},\mathcal{E},\nabla)$ with sampled points $\mathcal{X}_{n}=\{x_{1},\dots,x_{n}\}\subset\mathcal{M}$ , fix a geodesic $\gamma_{ij}$ between $x_{i}$ and $x_{j}$ , for all $i<j$ . Further, let $m_{\gamma_{ij}}$ denote the midpoint of this geodesic. Consider the graph $G_{n}=(\mathcal{X}_{n},E)$ with an undirected edge $e_{ij}$ between $x_{i}$ and $x_{j}$ , for each $i<j$ . The associated Hilbert cellular sheaf $\mathcal{F}_{n}^{t}$ on $G_{n}$ with bandwidth parameter $t$ is given by the following assignments:

•

The Hilbert space $\mathcal{F}_{n}^{t}(x_{i}):=\mathcal{E}_{x_{i}}$ for each $x_{i}\in\mathcal{X}_{n}$ , referred to as the node stalk over $x_{i}\in\mathcal{X}_{n}$ .
•

The Hilbert space $\mathcal{F}_{n}^{t}(e_{ij}):=\mathcal{E}_{m_{\gamma_{ij}}}$ for each $e_{ij}\in E$ , referred to as the edge stalk over $e_{i,j}\in E$ .

•

For each edge $e_{ij}\in E$ with bounding vertices $x_{i},x_{j}$ , a pair of bounded linear restriction maps

	$\displaystyle(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{i})\to\mathcal{F}_{n}^{t}(e_{ij}),$
	$\displaystyle(\mathcal{F}_{n}^{t})_{x_{j}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{j})\to\mathcal{F}_{n}^{t}(e_{ij}),$		(3)

where $k_{ij}^{t}=e^{-d_{\mathcal{M}}(x_{i},x_{j})^{2}/4t}$ , with $d_{\mathcal{M}}$ the geodesic distance on $\mathcal{M}$ , and $P_{x_{i}\to m_{\gamma_{ij}}}$ denotes the unitary parallel transport map on $\mathcal{E}$ between $x_{i}$ and $m_{\gamma_{ij}}$ .

For the sake of exposition, the choice of sample and corresponding geodesic paths will often be suppressed, so our parallel transports will be denoted as $P_{x_{i}\to m_{ij}}$ . Also, note that for $n<m$ and $\mathcal{X}_{n}\subset\mathcal{X}_{m}$ , we assume each additional point is again sampled iid from the uniform distribution on $\mathcal{M}$ . For the categorically-minded reader, we remark that our sheaf is constructed such that refining our sample then leads to a subfunctor $\mathcal{F}^{t}_{n}\subset\mathcal{F}^{t}_{m}$ . For the Hilbert sheaf $\mathcal{F}_{n}^{t}$ on the graph $G_{n}=(\mathcal{X}_{n},E)$ , a signal is an element of the Hilbert space

C^{0}(\mathcal{F}_{n}^{t};G_{n}):=\bigoplus_{x_{i}\in\mathcal{X}_{n}}\mathcal{F}_{n}^{t}(x_{i}).

(4)

Example 4.

If $\mathcal{F}_{n}^{t}$ encodes univariate spatiotemporal data, each node stalk can be chosen as $\mathcal{F}_{n}^{t}(x_{i})=L^{2}(\mathbb{R})$ . Then $C^{0}(\mathcal{F}_{n}^{t};G_{n})=\bigoplus_{x_{i}\in\mathcal{X}_{n}}L^{2}(\mathbb{R}),$ so a signal assigns a full time series to every node, recovering the usual notion of a node-time graph signal.

Example 5.

In one dimension, a probability distribution $\mu\in\mathcal{P}_{2}(\mathbb{R})$ can be represented by its quantile function $Q_{\mu}\in L^{2}([0,1])$ , and the Wasserstein distance becomes the $L^{2}$ distance between quantiles [99]. Thus, by choosing node stalks $\mathcal{F}_{n}^{t}(x_{i})=L^{2}([0,1]),$ a signal assigns a full probability distribution to each graph node, recovering the distributional graph-signal setting of [57, 112].

Finally, analogous to the construction of the connection Laplacian $\Delta_{\nabla}$ , we may construct the Hilbert sheaf Laplacian. Further details, such as self-adjointness, are discussed in Appendix G.5.

Definition 4 (Hilbert Sheaf Laplacian).

Let $\mathcal{F}_{n}^{t}$ be the Hilbert sheaf on the graph $G_{n}=(\mathcal{X}_{n},E)$ induced by Def, 3. Fix an orientation for each edge $e\in E$ . The Hilbert sheaf Laplacian is the bounded linear operator

\Delta_{\mathcal{F}_{n}^{t}}:C^{0}(\mathcal{F}_{n}^{t};G_{n})\to C^{0}(\mathcal{F}_{n}^{t};G_{n})

(5)

defined, for a signal $S\in C^{0}(\mathcal{F}_{n}^{t};G_{n})$ and at a node $x_{i}\in\mathcal{X}_{n}$ , by

(\Delta_{\mathcal{F}_{n}^{t}}S)_{x_{i}}=\sum_{\begin{subarray}{c}e\in E:\\ e=\{x_{i},x_{j}\}\end{subarray}}(\mathcal{F}_{n}^{t})_{x_{i}\leq e}^{*}\left((\mathcal{F}_{n}^{t})_{x_{i}\leq e}S_{x_{i}}-(\mathcal{F}_{n}^{t})_{x_{j}\leq e}S_{x_{j}}\right),

(6)

where $x_{j}$ denotes the other endpoint of $e$ , and $(\mathcal{F}_{n}^{t})_{x_{i}\leq e}^{*}$ is the adjoint of the restriction map $(\mathcal{F}_{n}^{t})_{x_{i}\leq e}$ .

Intuitively, $\Delta_{\mathcal{F}_{n}^{t}}$ measures how much a signal fails to be locally consistent across edges: before comparing $S_{x_{i}}$ and $S_{x_{j}}$ , both values are mapped into the common edge stalk $\mathcal{F}_{n}^{t}(e)$ by the restriction maps. Thus, it is a broad generalization of a graph Laplacian, with restriction maps replacing scalar edge weights. The Hilbert sheaf Laplacian $\Delta_{\mathcal{F}_{n}^{t}}$ is a self-adjoint bounded linear operator. Once the manifold is sampled and the induced sheaf Laplacian is computed, space-discretized HilbNets, which are still not implementable due to the infinite-dimensional signals, are simply given by Def, 2 with the connection Laplacian of $\mathcal{E}$ replaced by the sheaf Laplacian of $\mathcal{F}_{n}^{t}$ , i.e., by $\Omega(\mathcal{F}_{n}^{t},\Delta_{\mathcal{F}_{n}^{t}},\mathcal{W},\sigma)$ .

Signal Sampling. Hilbert cellular sheaves are the structures that arise when we sample our base manifold but faithfully record the potentially infinite-dimensional signal in each fiber. In practice, we typically only have access to a sampled or compressed version of the signal as well. For instance, when considering a timeseries $S\in L^{2}(\mathbb{R})$ , we may use the orthogonal Fourier basis $\overline{\{e^{ik\theta}\}}_{k\in\mathbb{Z}}=L^{2}(\mathbb{R})$ and then record a compressed representation with respect to this basis i.e. $[\langle S,e^{-id\theta}\rangle,\dots,\langle S,e^{id\theta}\rangle]$ for some $d$ . We can consider fiber-wise orthogonal projections with respect to any chosen basis in the Hilbert bundle setting as a principled approach to discretizing Hilbert bundle signals.

Proposition 1.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle, with strictly infinite-dimensional generic Hilbert-space fiber $\mathcal{H}$ . Fix an orthogonal basis $\mathcal{B}=\{e_{1},e_{2},...\}$ of $\mathcal{H}$ and let $\mathcal{H}_{d}:=\mathrm{span}(e_{1},e_{2},...,e_{d})$ . Then there exists a smooth map of bundles

\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}

(7)

where $\mathcal{E}_{d}$ is a $d$ -dimensional vector bundle with generic fiber $\mathcal{H}_{d}$ and at each $x\in\mathcal{M}$ , $\left.\Pi_{d}\right|_{\mathcal{E}_{x}}:\mathcal{E}_{x}\to\mathcal{E}_{d,x}$ recovers the usual orthogonal projection map. See Appendix H.4 for details.

Applying Proposition 1 to $(\mathcal{M},\mathcal{E},\nabla)$ , we obtain $(\mathcal{M},\mathcal{E}_{d},\nabla_{d})$ , to which we may apply the spatial discretization of Def.3 to construct the cellular sheaf $\mathcal{F}_{n,d}^{t}$ with $d$ -dimensional stalks. We refer to $\mathcal{F}_{n,d}^{t}$ as a network sheaf. The signals on this sheaf are then sampled Hilbert bundle signals, i.e., we can discretize $S\in L^{2}(\mathcal{M},\mathcal{E})$ as a $nd$ -dimensional vector $\mathbf{s}_{n,d}:=\Pi_{d}S\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n})\subseteq\mathbb{R}^{nd}$ , stacking the $d$ -dimensional orthogonal projections over the $n$ sampled locations, with respect to the chosen basis $\mathcal{B}$ . In this case, the restriction maps can be written as matrices, thus the sheaf Laplacian becomes a block matrix $\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}$ whose $(i,j)$ -block maps the discretized stalk over $x_{j}$ to the discretized stalk over $x_{i}$ and is given by

(\Delta_{\mathcal{F}_{n,d}^{t}})_{ij}=\begin{cases}\displaystyle\sum_{r:\,e_{ir}\in E}k_{ir}^{t}I_{d},&i=j,\\[11.99998pt] \displaystyle-P_{x_{j}\to x_{i}}^{(d,e_{ij})},&i\neq j\text{ and }e_{ij}\in E,\\[5.0pt] 0,&\text{otherwise,}\end{cases}

(8)

where $P_{x_{j}\to x_{i}}^{(d,e_{ij})}:=k_{ij}^{t}\left(P_{x_{i}\to m_{ij}}^{(d)}\right)^{*}P_{x_{j}\to m_{ij}}^{(d)}$ , and $P_{x_{j}\to m_{ij}}^{(d)}$ denotes the restriction of the parallel transport map from $\mathcal{E}_{x_{j}}$ to $\mathcal{E}_{m_{ij}}$ to the corresponding $d$ -dimensional subbundles in the image of $\Pi_{d}$ . We may thus use this Laplacian to build an implementable sheaf convolutional architecture as follows.

Definition 5 ( $(n,d)$ -Hilbert bundle convolutional neural network).

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle, with generic Hilbert-space fiber $\mathcal{H}$ and corresponding basis $\mathcal{B}$ . A $(n,d)$ -Hilbert bundle convolutional neural network, or $(n,d)$ -HilbNet, is specified by a filter bank $\mathcal{W}=\{g^{\ell}_{u,q}\}_{\ell,u,q}$ with $g^{\ell}_{u,q}\in L_{c}^{\infty}(\mathbb{R})$ , a Lipschitz continuous nonlinear activation $\sigma:\mathbb{R}\to\mathbb{R}$ , the choice of a $d$ -dimensional subbasis of $\mathcal{B}$ and sample $\mathcal{X}_{n}\subset\mathcal{M}$ . Given input sampled signals $\mathbf{s}_{n,d,1},...,\mathbf{s}_{n,d,F_{0}}\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n})$ , the $L$ -layer network output is obtained by the recursion

\mathbf{s}_{n,d,u}^{\ell+1}=\sigma\left(\sum_{q=1}^{F_{\ell}}g^{\ell}_{u,q}(\Delta_{\mathcal{F}_{n,d}^{t}})\mathbf{s}_{n,d,q}^{\ell}\right),\qquad\ell=0,\dots,L-1,\qquad\mathbf{s}_{n,d,q}^{0}=\mathbf{s}_{n,d,q}.

(9)

Discretized HilbNets are fully implementable and can be compactly written using Def. 2 with the connection Laplacian of $\mathcal{E}$ replaced by the sheaf Laplacian of $\mathcal{F}_{n,d}^{t}$ , i.e., by $\Omega(\mathcal{F}_{n,d}^{t},\Delta_{\mathcal{F}_{n,d}^{t}},\mathcal{W},\sigma)$ .

Example 6.

In the case that we consider our filter bank $\mathcal{W}$ to consist of order $K$ polynomials, the $(n,d)$ -HilbNet can be written as a novel variant of sheaf neural networks [50, 11] given by

\displaystyle\mathbf{S}_{n,d}^{\ell+1}=\sigma\left(\sum_{k=0}^{K-1}(\Delta_{\mathcal{F}_{n,d}^{t}})^{k}\mathbf{S}^{\ell}_{n,d}\mathbf{W}_{\ell,k}\right)\in\mathbb{R}^{nd}.

(10)

where the matrices $\mathbf{S}^{\ell}_{n,d}\in\mathbb{R}^{nd\times F_{\ell}}$ , and $\{\mathbf{W}_{\ell,k}\}_{k}$ , with $\mathbf{W}_{\ell,k}\in\mathbb{R}^{F_{\ell}\times F_{\ell+1}}$ collect the sampled signals and the learnable filter weights at each layer, respectively.

5 Theoretical Convergence Guarantees

Our main result may be understood as a far-reaching generalization of the convergence result of Belkin and Niyogi [14]. Consider a random sample $\mathcal{X}_{n}\subset\mathcal{M}$ and the corresponding geometric graph $G_{n}=(\mathcal{X}_{n},E)$ . It is established in [14] that as sampling density increases, the weighted graph Laplacian converges to the manifold Laplace-Beltrami operator in probability. We analogously show that the Hilbert sheaf Laplacian $\Delta_{\mathcal{F}_{n}}$ over $G_{n}$ converges to $\Delta_{\nabla}$ , thus recovering the results of [14] as the special case $\mathcal{E}=\mathcal{M}\times\mathbb{R}$ . Our proof, presented in Appendix H, is inspired by the strategy of [14] but with the necessary non-trivial modifications to accommodate the simultaneous generalization to cellular sheaves instead of graphs and to infinite-dimensional Hilbert-spaces. In order to state our results, we require the following intermediary operator.

Definition 6.

(Point-Cloud Extension of Sheaf Laplacian) Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle and consider a sample $\mathcal{X}_{n}\subset\mathcal{M}$ . Then the corresponding Hilbert sheaf Laplacian $\Delta_{\mathcal{F}^{t}_{n}}$ may be extended to the point-cloud Laplacian $\hat{\Delta}_{\mathcal{F}^{t}_{n}}$ , an operator on $L^{2}(\mathcal{M},\mathcal{E})$ via

(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x)=\frac{1}{n}\sum_{j}e^{-d_{\mathcal{M}}(x,x_{j})^{2}/4t}\big(S(x)-P_{x_{j}\to x}S(x_{j})\big)

(11)

As such, we are able to consider the sheaf-level and bundle-level Laplacians as operators on the same space through this extension. In this setting, we then have the following convergence result.

Theorem 1.

(Convergence of Hilbert Sheaf Laplacian) Let $\mathcal{M}$ be a $m$ -dimensional closed Riemannian manifold. Further, let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle and associated connection Laplacian $\Delta_{\nabla}$ . Fix a section $S\in C^{3}(\mathcal{M},\mathcal{E})$ . Consider a random sample $\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}$ . Let $\mathcal{F}^{t}_{n}$ be the induced Hilbert cellular sheaf with bandwidth $t$ . Then we have, for any $x\in\mathcal{M}$ ,

\lim_{n\rightarrow\infty}\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}{\hat{\Delta}}_{\mathcal{F}^{t_{n}}_{n}}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\quad\text{in probability,}

(A)

with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ . Further, if $S\in C^{4}(\mathcal{M},\mathcal{E})$ , we have

\lim_{n\rightarrow\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$-norm}.

(B)

Our framework may be seen as concurrently generalizing the convergence results for the weighted graph Laplacian of [14] as well as the graph connection Laplacian of [93] to allow for arbitrary bundles, with potentially infinite-dimensional fibers, equipped with an arbitrary choice of connection. Such convergence results have previously served as the basis of transferability and robustness results in geometric deep learning [101, 104, 67], as well as to justify the development of numerous Laplacian-based manifold learning techniques [13, 28, 108, 97, 85]. We may likewise develop generalizations of these results for implementable sheaf Laplacians by discretizing in the signal-domain.

Finite Rank Convergence. Consider a signal $S$ that has been sampled to $\mathbf{s}_{n,d}$ as per Proposition 1. We then have the following theoretical guarantee, which formalizes the intuitive notion that the fully discretized sheaf Laplacian converges to true connection Laplacian as we take an increasingly refined sample of both the underlying manifold and the signal.

Theorem 2 (Finite-Rank Approximation).

Consider the setting of Theorem 1 with a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ , for $\mathcal{E}$ a strictly infinite-dimensional Hilbert bundle. Then there exists a sequence of finite rank approximating sheaves $\mathcal{F}^{t_{n}}_{n,d_{n}}$ such that

\lim_{n\to\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}\mathbf{s}_{n,d_{n}}-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$-norm},

(12)

with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ .

While this result, as described in Appendix B, can pave the way for new developments of Laplacian-based manifold learning, we here restrict our focus to its consequences for $(n,d)$ -HilbNets.

Corollary 2.1 (Convergence in Architecture).

Under the hypotheses of Theorem 2, let $\{d_{n}\}_{n}$ be the required sequence. Fix a fiber-wise nonlinearity $\sigma$ that is $C_{\sigma}$ -Lipschitz in the corresponding fiber norms and choice of filter bank $\mathcal{W}$ . Then, the output of the discrete $(n,d)$ -HilbNet converges to the output of the continuous HilbNet architecture in the sense that

\Omega(\mathcal{F}^{t_{n}}_{n,d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}},\mathcal{W},\sigma)\to\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma)\quad\text{in mean squared error},

(13)

as the sampling density $n,d_{n}\to\infty$ .

Corollary 2.2.

(Transferability) Let $\{\mathcal{X}_{n}\}_{n=1}^{\infty}$ and $\{\mathcal{Y}_{n}\}_{n=1}^{\infty}$ be independent sequences of random samples of $\mathcal{M}$ . Let $\{d_{n}\}^{\infty}_{n=1}$ be a sequence such that the conclusion of Theorem 2 holds for both samplings. For any fiber-wise nonlinearity $\sigma$ that is $C_{\sigma}$ -Lipschitz in the corresponding fiber norms and any filter bank $\mathcal{K}$ , we then have that,

\lim_{n\to\infty}\mathbb{E}_{\mathcal{X},\mathcal{Y{}}}\left[\left\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$ norm.}

(14)

Further, one may derive a sample-independent quantitative bound for the $L^{2}$ disagreement $\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\|_{L^{2}}$ . See Appendix H for details.

By these results, we may understand the $(n,d)$ -HilbNets as the principled discretization of continuous HilbNets. These may also be understood as robustness results for $(n,d)$ -HilbNets, as they establish that the architecture is scale-consistent and is stable against resampling of the base manifold. Notably, by the generality of HilbNets, most existing geometric convolutional architectures can be understood as instances of HilbNets for a particular choice of bundle and connection. As such, our results may also be seen as extending the transferability guarantees of [101, 104, 67] to a larger class of architectures and data modalities. See Appendix C for further discussion.

$n$	Empirical	Theory	Empirical	Theory	Empirical	Theory
	Free $O(m)$		Circulant		Frozen identity (GCN)
$16$	$(1.42{\pm}0.21){\times}10^{-7}$	$0$	$(1.79{\pm}0.28){\times}10^{-2}$	$1.84{\times}10^{-2}$	$(2.31{\pm}0.27){\times}10^{-2}$	$2.30{\times}10^{-2}$
$32$	$(1.23{\pm}0.23){\times}10^{-7}$	$0$	$(1.16{\pm}0.14){\times}10^{-2}$	$1.18{\times}10^{-2}$	$(1.46{\pm}0.18){\times}10^{-2}$	$1.46{\times}10^{-2}$
$64$	$(1.82{\pm}0.77){\times}10^{-7}$	$0$	$(1.02{\pm}0.14){\times}10^{-2}$	$1.03{\times}10^{-2}$	$(1.29{\pm}0.17){\times}10^{-2}$	$1.29{\times}10^{-2}$
$128$	$(1.63{\pm}0.19){\times}10^{-7}$	$0$	$(8.85{\pm}0.71){\times}10^{-3}$	$8.93{\times}10^{-3}$	$(1.11{\pm}0.09){\times}10^{-2}$	$1.11{\times}10^{-2}$
$256$	$(1.59{\pm}0.28){\times}10^{-7}$	$0$	$(7.74{\pm}0.13){\times}10^{-3}$	$7.79{\times}10^{-3}$	$(9.67{\pm}0.14){\times}10^{-3}$	$9.67{\times}10^{-3}$

Table 1: Synthetic transport recovery. Empirical: best edge-MSE achieved by each variant. Theory: analytical squared Frobenius projection distance of

P^{LC}_{x_{i}\to m_{ij}}

onto the variant’s hypothesis class.

6 Experimental Results

A key practical advantage of the $(n,d)$ -HilbNets architecture in comparison to existing approaches for processing graph signals is our use of parallel transport, which in practice can be known or learned. For instance, these transport operators allow us to incorporate principled signal-level geometric priors in concert with the spatial priors of existing spatiotemporal GCNs. This is well-aligned with the thesis of geometric deep learning that the principled incorporation of geometric priors improves performance, particularly in the low-data or small-model regimes. For further discussion on strategies for either hand-crafting or learning parallel transport operators, see Appendix D. Here, we first validate our setup for a synthetic dataset realized from discretizing a known Hilbert bundle in information geometry, and then consider performance on real-world spatiotemporal graph benchmarks based upon traffic forecasting. In all the experiments, we use the polynomial $(n,d)$ -HilbNet from (10) and learned transport maps.

Model	Params	Horizon 3			Horizon 6			Horizon 12
		MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE
FC-LSTM [69]	${\sim}150\text{K}$	$3.44$	$6.30$	$9.6$	$3.77$	$7.23$	$10.9$	$4.37$	$8.69$	$13.2$
STAEformer [71]	${\sim}4.7\text{M}$	$2.65$	$5.11$	$6.85$	$2.97$	$6.00$	$8.13$	$3.34$	$7.02$	$9.70$
MLP fiber baseline	$5{,}212$	$3.131{\pm}0.004$	$6.074{\pm}0.007$	$8.271{\pm}0.021$	$3.775{\pm}0.005$	$7.496{\pm}0.013$	$10.626{\pm}0.046$	$4.690{\pm}0.011$	$9.184{\pm}0.014$	$14.341{\pm}0.097$
Spatiotemporal graph baseline	$8{,}908$	$3.453{\pm}0.080$	$6.709{\pm}0.220$	$9.241{\pm}0.279$	$4.160{\pm}0.117$	$8.158{\pm}0.184$	$11.916{\pm}0.491$	$5.277{\pm}0.093$	$10.102{\pm}0.128$	$16.006{\pm}0.662$
HilbNet, frozen identity (GCN)	$5{,}756$	$3.092{\pm}0.007$	$5.920{\pm}0.013$	$8.218{\pm}0.065$	$3.713{\pm}0.010$	$7.312{\pm}0.020$	$10.520{\pm}0.061$	$4.608{\pm}0.034$	$8.991{\pm}0.043$	$14.166{\pm}0.240$
HilbNet, circulant	$11{,}656$	$2.939{\pm}0.021$	$5.630{\pm}0.061$	$7.908{\pm}0.067$	$3.409{\pm}0.032$	$6.765{\pm}0.092$	$9.844{\pm}0.125$	$4.059{\pm}0.049$	$8.149{\pm}0.114$	$12.471{\pm}0.195$
HilbNet, free $O(T)$	$119{,}036$	$\mathbf{2.923{\pm}0.013}$	$\mathbf{5.586{\pm}0.048}$	$\mathbf{7.808{\pm}0.083}$	$\mathbf{3.372{\pm}0.023}$	$\mathbf{6.732{\pm}0.066}$	$\mathbf{9.507{\pm}0.096}$	$\mathbf{3.938{\pm}0.030}$	$\mathbf{8.042{\pm}0.101}$	$\mathbf{11.642{\pm}0.136}$

Model	Params	Horizon 3			Horizon 6			Horizon 12
		MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE
FC-LSTM [69]	${\sim}150\text{K}$	$2.05$	$4.19$	$4.8$	$2.20$	$4.55$	$5.2$	$2.37$	$4.96$	$5.7$
STAEformer [71]	${\sim}4.7\text{M}$	$1.31$	$2.78$	$2.76$	$1.62$	$3.68$	$3.62$	$1.88$	$4.34$	$4.41$
MLP fiber baseline	$5{,}212$	$1.459{\pm}0.003$	$3.145{\pm}0.017$	$3.026{\pm}0.020$	$1.942{\pm}0.004$	$4.378{\pm}0.017$	$4.385{\pm}0.045$	$2.513{\pm}0.004$	$5.658{\pm}0.015$	$6.209{\pm}0.038$
Spatiotemporal graph baseline	$8{,}908$	$\mathbf{1.400{\pm}0.002}$	$2.980{\pm}0.016$	$\mathbf{2.924{\pm}0.003}$	$1.850{\pm}0.002$	$4.162{\pm}0.009$	$4.222{\pm}0.009$	$2.388{\pm}0.004$	$5.395{\pm}0.023$	$5.924{\pm}0.032$
HilbNet, frozen identity (GCN)	$5{,}756$	$1.439{\pm}0.003$	$3.077{\pm}0.022$	$2.985{\pm}0.007$	$1.901{\pm}0.006$	$4.264{\pm}0.021$	$4.319{\pm}0.015$	$2.446{\pm}0.007$	$5.495{\pm}0.019$	$6.020{\pm}0.032$
HilbNet, circulant	$15{,}366$	$1.413{\pm}0.002$	$2.971{\pm}0.018$	$2.982{\pm}0.015$	$1.806{\pm}0.003$	$\mathbf{3.950{\pm}0.015}$	$4.162{\pm}0.037$	$2.211{\pm}0.014$	$\mathbf{4.866{\pm}0.048}$	$5.386{\pm}0.047$
HilbNet, free $O(T)$	$190{,}268$	$1.417{\pm}0.002$	$\mathbf{2.969{\pm}0.015}$	$3.058{\pm}0.013$	$\mathbf{1.793{\pm}0.005}$	$3.958{\pm}0.026$	$\mathbf{4.127{\pm}0.037}$	$\mathbf{2.181{\pm}0.003}$	$4.873{\pm}0.019$	$\mathbf{5.214{\pm}0.045}$

Table 2: METR-LA (top) and PEMS (bottom) traffic forecasting results. Bottom block of each table: our experiments (mean

\pm

standard deviation over five seeds). Top block of each table: external baselines reported as in the cited papers. MAPE is in percent. Lower is better for all metrics.

Synthetic Experiments. We first consider a task where, for a known Hilbert bundle and connection, we train a discretized HilbNet to predict the true parallel transport operators. Following [72], the base manifold is $\mathcal{M}=\mathrm{Sym}^{++}(p)$ equipped with the Otto-Wasserstein metric, each $\Sigma\in\mathcal{M}$ parameterizing a density $\mathcal{N}(0,\Sigma)$ . The ambient fiber $\mathcal{H}_{\Sigma}=L^{2}(\rho_{\Sigma};\mathbb{R}^{p})$ is genuinely infinite-dimensional; the computational fiber is the Otto-velocity image of covariance perturbations, a sub-bundle whose fibers are already finite-dimensional with $d=p(p+1)/2$ , and on which the Levi-Civita transports $P^{LC}_{x_{i}\to m_{ij}}$ admit a closed-form. We sample $n$ points, build a $k$ NN graph $G_{n}$ ( $k{=}8$ ) under $W_{2}$ , and assemble the network sheaf $\mathcal{F}_{n,d}^{t}$ and its Laplacian $\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}$ from Def, 3. We consider three transport parametrizations from Appendix D ( averaged over 3 seeds): free $O(d)$ (Householder), circulant, and frozen identity (a usual GCN [59]), to recover the Levi-Civita transports in Cholesky-rescaled coordinates. As the reader can notice in Table 4, the free class recovers $P^{LC}_{x_{i}\to m_{ij}}$ to numerical precision ${\approx}1.6\cdot 10^{-7}$ , while each restricted class converges to its analytical Frobenius-projection plateau to within $1\%$ . This confirms that the transport hypothesis class constrains the per-edge restriction maps of $\Delta_{\mathcal{F}_{n,d}^{t}}$ in a quantitatively predictable way. More experiments and details are in Appendix F.1

Traffic Forecasting. We evaluate HilbNets on real-world spatiotemporal traffic-speed forecasting, where each node of a road-network graph carries a time-series fiber with $d=T$ observed time steps. This is a natural instance of the Hilbert-bundle framework: the base graph encodes spatial proximity among sensors, while the edgewise transports $P_{j\to i}^{e_{ij}}$ from Appendix D control how temporal fibers are aligned before filtering. We test on two standard benchmarks, METR-LA [69] and PEMS-BAY [69], predicting future speeds at horizons $3$ , $6$ , and $12$ . We compare the same three HilbNet transport classes, frozen identity (a GCN), circulant, and free $O(T)$ , against a fiber-only MLP and a spatiotemporal graph baseline obtained by stacking GCN layers with one-dimensional convolutional layers processing the temporal dimension, all sharing the same polynomial sheaf filter order, readout, and forecasting loss. Full experimental details are given in Appendix F.2. Table 2 reports MAE, RMSE, and MAPE (mean $\pm$ std over five seeds). On both datasets, learning non-trivial transports consistently improves over frozen identity at all horizons, confirming that the sheaf structure helps beyond the usual graph structure. The free $O(T)$ class achieves the best overall accuracy, but the circulant variant is competitive while using roughly one tenth of the transport parameters, supporting the use of structured, physically motivated transport priors for spatiotemporal data. This confirms that geometric inductive biases can lead to better performance in low-data regimes or comparable performance in normal regimes with substantially fewer parameters.

7 Conclusions

We introduced a novel convolutional learning framework for infinite-dimensional signals over a manifold using Hilbert bundles, a setting that concurrently unifies and generalizes existing approaches. It allows us to consider arbitrary connection Laplacians, a more general class of filters via the Borel calculus, and thus applications of the resulting filters to potentially infinite-dimensional signals. We defined HilbNets as stacks of Hilbert bundle filters and pointwise non-linearity. We consequently introduced a practically implementable (n,d)-HilbNet via the theory of Hilbert cellular sheaves, and proved that this discretized architecture converges to the continuous architecture in the limit. Notably, our convergence in architecture is derived from a novel extension of the Laplacian convergence result of [14] to the setting of Hilbert sheaves and Hilbert bundles, and we believe this result will be of independent interest to the broader machine learning community. Lastly, we verified the benefits of integrating domain-specific geometric priors through experiments with discretized HilbNets on synthetic and real-world data. Overall, we envision the prospective impact of our contributions as two-fold: the HilbNet framework allows for the principled development of domain-specific architectures through appropriate choices of connection, filter bank, and manifold- and signal-alignment measures, while our Hilbert Laplacian convergence theorem lays the theoretical groundwork for the development of Laplacian-based manifold-theoretic techniques in the setting of infinite-dimensional signals, spanning from mechanistic interpretability to self-supervised learning methods. A more detailed discussion on broader impact and limitations is presented in Appendix B.

References

[1] O. Abdel-Hamid et al. (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In 2012 International Conference on Acoustics, Speech and Signal Processing (ICASSP), External Links: Document Cited by: §1.
[2] M. Aggarwal and M. N. Murty (2020) Machine learning in social networks: embedding nodes, edges, communities, and graphs. Springer Nature. Cited by: §1.
[3] A. Ambrosetti and G. Prodi (1995) A primer of nonlinear analysis. Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, UK. External Links: ISBN 9780521454057 Cited by: §H.1.
[4] S. Axelrod, S. della Pietra, and E. Witten (1991-05) Geometric quantization of chern–simons gauge theory. Journal of Differential Geometry 33 (3), pp. 787–902. External Links: Document Cited by: Example 1.
[5] J. Bamberger, F. Barbero, X. Dong, and M. M. Bronstein (2025) Bundle neural network for message diffusion on graphs. In The Thirteenth International Conference on Learning Representations, External Links: Link Cited by: Appendix A, §C.1.1.
[6] S. Barbarossa and S. Sardellitti (2020) Topological signal processing over simplicial complexes. IEEE Trans. on Signal Processing 68, pp. 2992–3007. Cited by: §1.
[7] F. Barbero et al. (2022) Sheaf neural networks with connection laplacians. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1.
[8] T. Batard (2011) Heat equations on vector bundles—application to color image regularization. Journal of Mathematical Imaging and Vision 41 (1-2), pp. 59–85. External Links: Document Cited by: Example 3.
[9] C. Battiloro et al. (2023) Tangent bundle filters and neural networks: from manifolds to cellular sheaves and back. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. Cited by: Appendix A, §C.1.1, §1.
[10] C. Battiloro, L. Testa, L. Giusti, S. Sardellitti, P. Di Lorenzo, and S. Barbarossa (2024) Generalized simplicial attention neural networks. IEEE Transactions on Signal and Information Processing over Networks 10, pp. 833–850. Cited by: §1.
[11] C. Battiloro, Z. Wang, H. Riess, P. Di Lorenzo, and A. Ribeiro (2024) Tangent bundle convolutional learning: from manifolds to cellular sheaves and back. IEEE Transactions on Signal Processing. Cited by: Appendix A, Appendix A, Appendix D, §1, §1, §1, §1, §1, Example 6.
[12] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision 61 (2), pp. 139–157. External Links: Document Cited by: Appendix D.
[13] M. Belkin and P. Niyogi (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems 14. Cited by: Appendix B, Appendix B, §5.
[14] M. Belkin and P. Niyogi (2008) Towards a theoretical foundation for laplacian-based manifold methods. Journal of Computer and System Sciences 74 (8), pp. 1289–1308. Note: Learning Theory 2005 External Links: ISSN 0022-0000, Document, Link Cited by: Appendix A, Appendix A, Appendix B, Appendix B, Appendix B, §C.1.1, 2nd item, 3rd item, §G.6, §H.2, §1, §1, §1, §5, §5, §7.
[15] N. Berline, E. Getzler, and M. Vergne (1992) Heat kernels and dirac operators. Springer Berlin, Heidelberg. External Links: Document Cited by: §G.3, §G.3, Remark 7.
[16] C. Bodnar et al. (2021) Weisfeiler and lehman go cellular: cw networks. In Advances in Neural Information Processing Systems, Vol. 34, pp. 2625–2640. Cited by: §1.
[17] C. Bodnar et al. (2021) Weisfeiler and Lehman go topological: message passing simplicial networks. In ICLR 2021 Workshop on Geometrical and Topological Representation Learning, Cited by: §1.
[18] C. Bodnar et al. (2022) Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in gnns. arXiv. External Links: Document Cited by: Appendix A, §C.1.1, Appendix D, §1.
[19] B. Bonev, T. Kurth, C. Hundt, J. Pathak, M. Baust, K. Kashinath, and A. Anandkumar (2023) Spherical fourier neural operators: learning stable dynamics on the sphere. In International conference on machine learning, pp. 2806–2823. Cited by: Appendix A.
[20] V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth (2020) Matérn gaussian processes on Riemannian manifolds. In Advances in Neural Information Processing Systems, Vol. 33. External Links: Link, 2006.10160 Cited by: Appendix A, §1.
[21] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković (2021) Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478. Cited by: §2.
[22] M. M. Bronstein et al. (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: Appendix A.
[23] J. Brüning and M. Lesch (1992) Hilbert complexes. Journal of Functional Analysis 108, pp. 88–132. Cited by: item 2.
[24] A. Caponera and D. Marinucci (2021) Asymptotics for spherical functional autoregressions. The Annals of Statistics 49 (1), pp. 346–369. External Links: Document Cited by: Appendix A.
[25] É. Cartan (1967) Differential calculus. Princeton University Press, Princeton, NJ. Cited by: §H.2.
[26] G. Chen, X. Liu, Q. Meng, L. Chen, C. Liu, and Y. Li (2024) Learning neural operators on riemannian manifolds. National Science Open 3 (6), pp. 20240001. Cited by: Appendix A.
[27] T. S. Cohen, M. Geiger, and M. Weiler (2019) A general theory of equivariant cnns on homogeneous spaces. Advances in neural information processing systems 32. Cited by: §1.
[28] R. R. Coifman and S. Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis 21 (1), pp. 5–30. External Links: Document, Link Cited by: Appendix B, Appendix B, §5.
[29] J. M. Curry (2014) Sheaves, cosheaves and applications. University of Pennsylvania. Cited by: Appendix A, §G.5, §1.
[30] G. D’Acunto and C. Battiloro (2025) The relativity of causal knowledge. In The 41st Conference on Uncertainty in Artificial Intelligence, External Links: Link Cited by: Appendix A.
[31] X. Dai and H. Müller (2018) Principal component analysis for functional data on Riemannian manifolds and spheres. The Annals of Statistics 46 (6B), pp. 3309–3338. External Links: Document, Link Cited by: Appendix A.
[32] R. Dangovski, L. Jing, C. Loh, S. Han, A. Srivastava, B. Cheung, P. Agrawal, and M. Soljačić (2021) Equivariant contrastive learning. arXiv preprint arXiv:2111.00899. Cited by: Appendix B.
[33] V. De Bortoli, E. Mathieu, M. Hutchinson, J. Thornton, Y. W. Teh, and A. Doucet (2022) Riemannian score-based generative modelling. Advances in neural information processing systems 35, pp. 2406–2422. Cited by: Appendix A.
[34] P. De Haan et al. (2020) Gauge equivariant mesh cnns: anisotropic convolutions on geometric graphs. arXiv preprint arXiv:2003.05425. Cited by: §1.
[35] L. Di Nino, G. D’Acunto, S. Barbarossa, and P. Di Lorenzo (2025) Learning the structure of connection graphs. arXiv preprint arXiv:2510.11245. Cited by: Appendix A.
[36] M. P. do Carmo (1992) Riemannian geometry. Mathematics: Theory & Applications, Birkhäuser, Boston. External Links: ISBN 978-0817634902 Cited by: 1st item, §H.2.
[37] I. Duta, G. Cassarà, F. Silvestri, and P. Liò (2023) Sheaf hypergraph networks. Advances in Neural Information Processing Systems 36, pp. 12087–12099. Cited by: Appendix A, §1.
[38] C. Fefferman, S. Mitter, and H. Narayanan (2016) Testing the manifold hypothesis. Journal of the American Mathematical Society 29 (4), pp. 983–1049. Cited by: §1.
[39] S. Fiorini, H. Aktas, I. Duta, S. Coniglio, P. Morerio, A. Del Bue, and P. Liò (2025) Sheaves reloaded: a directional awakening. arXiv preprint arXiv:2506.02842. Cited by: Appendix A, §1.
[40] F. Gama et al. (2018) Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing 67 (4), pp. 1034–1049. Cited by: §1.
[41] F. Gama et al. (2020-11) Graphs, convolutions, and neural networks: from graph filters to graph neural networks. IEEE Signal Processing Magazine 37, pp. 128–138. External Links: Document Cited by: §1, §3.
[42] R. Ghrist and H. Riess (2022) Cellular sheaves of lattices and the tarski laplacian. Homology, Homotopy and Applications 24 (1), pp. 325–345. Cited by: Appendix A.
[43] L. Giusti, C. Battiloro, et al. (2023) Cell attention networks. In 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.
[44] J. J. Gould (2025) Cellular sheaves of hilbert spaces. Ph.D. Thesis, University of Pennsylvania. Cited by: §G.5, §G.5, §4.
[45] F. Grassi, A. Loukas, N. Perraudin, and B. Ricaud (2017) A time-vertex signal processing framework: scalable processing and meaningful representations for time-series on graphs. IEEE Transactions on Signal Processing 66 (3), pp. 817–829. Cited by: §C.1.3.
[46] A. Grigor’yan (2009) Heat kernel and analysis on manifolds. AMS/IP Studies in Advanced Mathematics, Vol. 47, American Mathematical Society, Providence, RI. Cited by: §C.1.3.
[47] E. Grimaldi, M. E. Pandolfo, G. D’Acunto, S. Barbarossa, and P. Di Lorenzo (2025) Learning network sheaves for ai-native semantic communication. arXiv preprint arXiv:2512.03248. Cited by: Appendix A.
[48] A. Grothendieck (1955) A general theory of fibre spaces with structure sheaf. University of Kansas, Department of Mathematics. Cited by: Appendix A.
[49] T. Hanks, H. Riess, S. Cohen, T. Gross, M. Hale, and J. Fairbanks (2025) Distributed multi-agent coordination over cellular sheaves. arXiv preprint arXiv:2504.02049. Cited by: Appendix A.
[50] J. Hansen and T. Gebhart (2020) Sheaf neural networks. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1, Example 6.
[51] J. Hansen and R. Ghrist (2019) Learning sheaf laplacians from smooth signals. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 5446–5450. External Links: Document Cited by: Appendix A.
[52] J. Hansen and R. Ghrist (2019-12-01) Toward a spectral theory of cellular sheaves. Journal of Applied and Computational Topology 3 (4), pp. 315–358. External Links: ISSN 2367-1734, Document, Link Cited by: Appendix A, §G.5.
[53] J. Hansen and R. Ghrist (2021) Opinion dynamics on discourse sheaves. SIAM Journal on Applied Mathematics 81 (5), pp. 2033–2060. External Links: Document, https://doi.org/10.1137/20M1341088 Cited by: Appendix A, §G.5.
[54] E. J. Hu, M. Jain, E. Elmoznino, Y. Kaddar, G. Lajoie, Y. Bengio, and N. Malkin (2024) Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1.
[55] H. Huang, Y. LeCun, and R. Balestriero (2026) Semantic tube prediction: beating llm data efficiency with jepa. External Links: 2602.22617, Link Cited by: Appendix B.
[56] M. Hutchinson, A. Terenin, V. Borovitskiy, S. Takao, Y. Teh, and M. Deisenroth (2021) Vector-valued gaussian processes on riemannian manifolds via gauge independent projected kernels. Advances in Neural Information Processing Systems 34, pp. 17160–17169. Cited by: Appendix A, §1.
[57] F. Ji, Y. Zhao, S. H. Lee, K. Zhao, W. P. Tay, W. P. Tay, and J. Yang (2025-07) Graph distributional signals for regularization in graph neural networks. IEEE Transactions on Signal and Information Processing over Networks 11, pp. 670–682. External Links: Document Cited by: Example 5.
[58] A. Jiao, Q. Yan, J. Harlim, and L. Lu (2024) Solving forward and inverse pde problems on unknown manifolds via physics-informed neural operators. arXiv preprint arXiv:2407.05477. Cited by: Appendix A.
[59] T. N. Kipf and M. Welling (2017) Semi-Supervised Classification with Graph Convolutional Networks. In Proc. of the 5th International Conference on Learning Representations (ICLR), External Links: Link Cited by: §C.1.1, §1, §6.
[60] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 25, pp. . Cited by: §1.
[61] N. H. Kuiper (1965) The homotopy type of the unitary group of hilbert space. Topology 3 (1), pp. 19–30. External Links: Document Cited by: §H.4.
[62] S. Lang (1995) Differential and riemannian manifolds. 3 edition, Graduate Texts in Mathematics, Vol. 160, Springer, New York, NY. External Links: ISBN 978-0-387-94338-1, Document, ISSN 0072-5285 Cited by: §G.1.4.
[63] Y. LeCun et al. (1998) Gradient-based learning applied to document recognition. Proc. of the IEEE 86 (11), pp. 2278–2324. Cited by: §1.
[64] M. Ledoux and M. Talagrand (1991) Probability in banach spaces: isoperimetry and processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Vol. 23, Springer-Verlag, Berlin. Cited by: §H.1.
[65] J. Leray (1946) L’anneau d’homologie d’une représentation. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 222, pp. 1366–1368 (French). Cited by: Appendix A, §4.
[66] G. Leus, A. G. Marques, J. M. Moura, A. Ortega, and D. I. Shuman (2023) Graph signal processing: history, development, impact, and outlook. IEEE Signal Processing Magazine 40 (4), pp. 49–60. Cited by: §2.
[67] R. Levie, W. Huang, et al. (2021) Transferability of spectral graph convolutional neural networks. Journal of Machine Learning Research 22 (272), pp. 1–59. Cited by: Appendix A, §1, §5, §5.
[68] D. Li, S. Arya, and R. Ghrist (2025) Learning from frustration: torsor cnns on graphs. In Proceedings of the Workshop on Symmetry and Geometry in Neural Representations at NeurIPS 2025, Note: Workshop paper External Links: Link Cited by: §C.1.2, Appendix D.
[69] Y. Li, R. Yu, C. Shahabi, and Y. Liu (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In International Conference on Learning Representations (ICLR), External Links: Link Cited by: §F.2.1, §F.2.1, §F.2.3, Table 2, Table 2, §6.
[70] E. Lila, J. A. Aston, and L. M. Sangalli (2016) Smooth principal component analysis over two-dimensional manifolds with an application to neuroimaging. Cited by: Appendix A.
[71] H. Liu, Z. Dong, R. Jiang, J. Deng, J. Deng, Q. Chen, and X. Song (2023) Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), External Links: Document Cited by: §F.2.3, Table 2, Table 2.
[72] L. Malagò, L. Montrucchio, and G. Pistone (2018-12) Wasserstein riemannian geometry of gaussian densities. Information Geometry 1 (2), pp. 137–179. External Links: ISSN 2511-249X, Document, Link Cited by: §6, Example 2.
[73] I. Marisca, J. Bamberger, C. Alippi, and M. M. Bronstein (2026) Over-squashing in spatiotemporal graph neural networks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §C.1.3.
[74] P. Mostowsky, V. Dutordoir, I. Azangulov, N. Jaquier, M. J. Hutchinson, A. Ravuri, L. Rozo, A. Terenin, and V. Borovitskiy (2024) The GeometricKernels package: heat and matérn kernels for geometric learning on manifolds, meshes, and graphs. arXiv:2407.08086. External Links: Link Cited by: Appendix A, §1.
[75] C. Müller and C. Wockel (2009-Sept) Equivalences of smooth and continuous principal bundles with infinite-dimensional structure group. advg 9 (4), pp. 605–626. External Links: ISSN 1615-715X, Link, Document Cited by: item 3, §H.4.
[76] L. I. Nicolaescu (2007) Lectures on the geometry of manifolds. 2 edition, World Scientific, Singapore. External Links: ISBN 9789812708533 Cited by: §G.2.
[77] M. Papillon, G. Bernardez, C. Battiloro, and N. Miolane (2025) TopoTune: a framework for generalized combinatorial complex neural networks. External Links: Link Cited by: §1.
[78] Y. Peng, J. Dong, Y. Zeng, H. Li, C. Ju, H. Feng, D. Taha, A. Wienhard, and K. Xia (2026) Sheaf neural networks on spd manifolds: second-order geometric representation learning. arXiv preprint arXiv:2604.20308. Cited by: Appendix A, §1.
[79] P. Petersen (2006) Riemannian geometry. 2 edition, Graduate Texts in Mathematics, Vol. 171, Springer, New York. External Links: ISBN 978-0-387-29403-2, Document Cited by: §G.2.
[80] I. F. Pinelis (1991) Inequalities for distributions of sums of independent random vectors and their application to estimating a density. Theory of Probability & Its Applications 35 (3), pp. 605–607. External Links: Document Cited by: §H.1, §H.2.
[81] J. Porras-Valenzuela, Z. Wang, and A. Ribeiro (2026) Size transferability of graph transformers with convolutional positional encodings. External Links: 2602.15239, Link Cited by: Appendix B.
[82] M. Reed and B. Simon (1972) Functional analysis. Methods of Modern Mathematical Physics, Vol. 1, Academic Press, New York. External Links: ISBN 0125850018 9780125850018, Link Cited by: §G.4, §H.6.
[83] H. Riess and R. Ghrist (2022) Diffusion of information on networked lattices by gossip. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 5946–5952. Cited by: Appendix A.
[84] H. M. Riess and J. Hansen (2020) Multidimensional persistence module classification via lattice-theoretic convolutions. In NeurIPS Workshop: TDA & Beyond, Cited by: §1.
[85] R. M. Rustamov (2007) Laplace-beltrami eigenfunctions for deformation invariant shape representation. In Proceedings of the Symposium on Geometry Processing (SGP), A. Belyaev and M. Garland (Eds.), pp. 225–233. External Links: Document, ISBN 978-3-905673-46-3, ISSN 1727-8384 Cited by: §5.
[86] F. Scarselli et al. (2008) The graph neural network model. IEEE Trans. on neural networks 20 (1), pp. 61–80. Cited by: §1.
[87] S. C. Schonsheck et al. (2018) Parallel transport convolution: a new tool for convolutional neural networks on manifolds. arXiv preprint arXiv:1805.07857. Cited by: §1.
[88] J. Serre (1955) Faisceaux algébriques cohérents. Annals of Mathematics, pp. 197–278. Cited by: Appendix A.
[89] L. Shao, Z. Lin, and F. Yao (2022) Intrinsic riemannian functional data analysis for sparse longitudinal observations. The Annals of Statistics 50 (3), pp. 1696–1721. Cited by: Appendix A.
[90] A. D. Shepard (1985) A cellular description of the derived category of a stratified space. Brown University. Cited by: Appendix A, §1.
[91] D. I. Shuman et al. (2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30 (3), pp. 83–98. Cited by: §1.
[92] A. Singer and H.-T. Wu (2012) Vector diffusion maps and the connection laplacian. Communications on Pure and Applied Mathematics 65 (8), pp. 1067–1144. External Links: Document Cited by: Appendix A, Appendix B, §C.1.1, §1.
[93] A. Singer and H. Wu (2017) Spectral convergence of the connection laplacian from random samples. Information and Inference: A Journal of the IMA 6 (1), pp. 58–123. External Links: Document, Link Cited by: Appendix A, Appendix B, Appendix B, §1, §5.
[94] F. Spoto, A. Caponera, and P. Brutti (2025) Change point detection for functional autoregressive processes on the sphere. arXiv preprint arXiv:2512.03255. Cited by: Appendix A.
[95] J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu (2024) Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, pp. 127063. Cited by: Appendix B.
[96] S. Thakoor, C. Tallec, M. G. Azar, M. Azabou, E. L. Dyer, R. Munos, P. Veličković, and M. Valko (2022) Large-scale representation learning on graphs via bootstrapping. In International Conference on Learning Representations, External Links: Link Cited by: Appendix B.
[97] B. Vallet and B. Lévy (2008) Spectral geometry processing with manifold harmonics. Computer Graphics Forum 27 (2), pp. 251–260. External Links: Document, ISSN 1467-8659 Cited by: §5.
[98] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Conference on Neural Information Processing Systems (NeurIPS), pp. 6000–6010. External Links: Link Cited by: Appendix B.
[99] C. Villani (2003) Topics in optimal transportation. Graduate Studies in Mathematics, Vol. 58, American Mathematical Society, Providence, RI. External Links: ISBN 978-0821833124 Cited by: Example 5.
[100] U. Von Luxburg, M. Belkin, and O. Bousquet (2008) Consistency of spectral clustering. The Annals of Statistics, pp. 555–586. Cited by: Appendix B.
[101] Z. Wang, L. Ruiz, and A. Ribeiro (2021) Stability of manifold neural networks to deformations. arXiv preprint arXiv:2106.03725. Cited by: §C.1.1, §5, §5.
[102] Z. Wang, L. Ruiz, and A. Ribeiro (2021) Stability of neural networks on riemannian manifolds. In 2021 29th European Signal Processing Conference (EUSIPCO), pp. 1845–1849. Cited by: §1, §1.
[103] Z. Wang, L. Ruiz, and A. Ribeiro (2022) Convolutional neural networks on manifolds: from graphs and back. arXiv:2210.00376. Cited by: Appendix A, §1, §1.
[104] Z. Wang, L. Ruiz, and A. Ribeiro (2024) Geometric graph filters and neural networks: limit properties and discriminability trade-offs. IEEE Transactions on Signal Processing 72 (), pp. 2244–2259. External Links: Document Cited by: §5, §5.
[105] M. Weiler, P. Forré, E. Verlinde, and M. Welling (2026) Equivariant and coordinate independent convolutional networks: a gauge field theory of neural networks. Progress in Data Science, Vol. 1, World Scientific Publishing Company. Note: Monograph on equivariant and gauge-theoretic neural network architectures and their coordinate-independent generalizations External Links: ISBN 9789819806621 Cited by: §C.1.2.
[106] H. Wu, K. Weng, S. Zhou, X. Huang, and W. Xiong (2024) Neural manifold operators for learning the evolution of physical dynamics. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3356–3366. Cited by: Appendix A.
[107] Y. Xie, J. Tian, and X. X. Zhu (2020) Linking points with labels in 3d: a review of point cloud semantic segmentation. IEEE Geoscience and Remote Sensing Magazine 8 (4), pp. 38–59. Cited by: §1.
[108] Z. Yang, S. Huang, H. Feng, and D. Zhou (2024) Spherical analysis of learning nonlinear functionals. arXiv preprint arXiv:2410.01047. Cited by: §5.
[109] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen (2020) Graph contrastive learning with augmentations. Advances in neural information processing systems 33, pp. 5812–5823. Cited by: Appendix B.
[110] J. Yu, J. Choi, D. Lee, H. Hong, and J. Kim (2024) Self-supervised transformation learning for equivariant representations. Advances in Neural Information Processing Systems 37, pp. 83068–83090. Cited by: Appendix B.
[111] O. Zaghen, F. Eijkelboom, A. Pouplin, and E. J. Bekkers (2025) Towards variational flow matching on general geometries. In ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, External Links: Link Cited by: Appendix A.
[112] Y. Zhao, F. Ji, X. Jian, and W. P. Tay (2026) Graph distribution-valued signals: a wasserstein space perspective. External Links: 2509.25802, Link Cited by: Example 5.

Appendix Contents

A Extended Related Works ..................................................................................................................	A
B Broader Impact, Future Directions and Limitations ..................................................................................................................	B
C Existing Convolutional Architectures as Actualizations of HilbNets ..................................................................................................................	C
C.1 Universality of HilbNets ..................................................................................................................	C.1
C.1.1 CNNs, GNNs, and Sheaf NNs ..................................................................................................................	C.1.1
C.1.2 Equivariant CNNs and GNNs ..................................................................................................................	C.1.2
C.1.3 Spatio-Temporal GNNs ..................................................................................................................	C.1.3
D Practical Implementations of Parallel Transport ..................................................................................................................	D
E Parallel Transport Parametrizations ..................................................................................................................	E
E.1 From Bundle Transports to Network-Sheaf Restrictions ..................................................................................................................	E.1
E.2 Transport Hypothesis Classes ..................................................................................................................	E.2
E.3 Parameter Counts ..................................................................................................................	E.3
F Additional Experimental Details ..................................................................................................................	F
F.1 Synthetic Experiments: The Statistical Bundle over Centered Gaussians ..................................................................................................................	F.1
F.1.1 The Bundle ..................................................................................................................	F.1.1
F.1.2 Levi-Civita Connection and Closed-Form Parallel Transport ..................................................................................................................	F.1.2
F.1.3 Cholesky Rescaling ..................................................................................................................	F.1.3
F.1.4 Sample Construction ..................................................................................................................	F.1.4
F.1.5 Spectral Stability under Sampling Density Increase ..................................................................................................................	F.1.5
F.1.6 Hyperparameters ..................................................................................................................	F.1.6
F.2 Traffic Forecasting: Experimental Details ..................................................................................................................	F.2
F.2.1 Datasets ..................................................................................................................	F.2.1
F.2.2 Task Formulation ..................................................................................................................	F.2.2
F.2.3 Model Variants and Baselines ..................................................................................................................	F.2.3
F.2.4 Detailed Results ..................................................................................................................	F.2.4
F.2.5 Hyperparameters ..................................................................................................................	F.2.5
G Mathematical Background ..................................................................................................................	G
G.1 Hilbert Bundles ..................................................................................................................	G.1
G.2 Connection Laplacian ..................................................................................................................
G.3 Heat Flow on a Hilbert Bundle ..................................................................................................................	G.3
G.4 Borel Functional Calculus ..................................................................................................................	G.4
G.5 Cellular Sheaves and Sheaf Laplacians ..................................................................................................................	G.5
G.6 Empirical Laplacians ..................................................................................................................	G.6
H Proofs of Results ..................................................................................................................	H
H.1 Auxiliary Lemmas for Theorem 1 ..................................................................................................................
H.2 Key Lemmas for Theorem 1 ..................................................................................................................
H.3 Proof of Theorem 1 ..................................................................................................................	H.3.1
H.4 Key Lemmas for Theorem 2 ..................................................................................................................
H.5 Proof of Theorem 2 ..................................................................................................................	H.5
H.6 Key Lemmas for Corollary 1 (Convergence in Architecture) ..................................................................................................................
H.7 Proof of Corollary 1 (Convergence in Architecture) ..................................................................................................................
H.8 Proof of Corollary 2 (Transferability) ..................................................................................................................

Appendix A Extended Related Works

The connection between (possibly) continuous domains (manifolds and bundles) and discrete structure (graphs and cellular sheaves) first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional (Riemannian) manifolds [22]. The manifold hypothesis underpins several modern spectral graph methods, i.e., nonlinear dimensionality-reduction/clustering/(deep) learning techniques that exploit latent geometric structures. The renowned work [14] from Belkin & Niyogi proved that, assuming access to a finite point cloud (the signals) sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.

The work in [14] and related results, such as [92, 93], have been used (directly or indirectly) to design principled learning systems over manifolds. A consistent fraction of this literature focused on scalar manifold signals, thus the case in which one or more scalar values are attached to each point of a manifold. Notable examples are manifold convolutional neural networks [103, 67], kernel methods and Gaussian processes on manifolds [20, 74], as well as a growing literature of generative manifold models [111, 33]. In a complementary direction, operator-learning methods on manifolds extend neural operators beyond Euclidean domains; these methods handle an infinite-dimensional object globally, but they still assign a finite vector to each point of a manifold. Most of the works in this class are instances of neural manifold operators [26, 19, 106, 58], which aim at resolution-independent learning of PDE solution operators. Some works explored vector-valued manifold signals, i.e., multivariate real-valued functions supported on manifolds; in this case, one or more finite vectors are attached to each point of a manifold. Examples are tangent bundle convolutional neural networks [11] and vector-field Gaussian processes on manifolds built via gauge-independent projected kernels [56]. Moreover, especially in the statistics community, functional observations with manifold structure, i.e., manifold-valued functions supported on the real line, have been long studied [70, 31, 89], and recent works have started analyzing autoregressive processes on the sphere [24, 94]. Finally, learning systems acting on discrete bundles, i.e., bundles whose base space is a finite set/discrete manifold, have been recently investigated [5]. Despite their diversity, all the models cited in this paragraph use finite-dimensional fibers and implicitly assume the Levi-Civita connection. As such, they do not allow for the arbitrary connections or the potentially infinite-dimensional signals considered. One of the main reasons behind this gap is the lack of a rigorous generalization of the [14]’s convergence result in these settings, which is our main contribution.

Pioneering works on sheaf theory can be found in [65, 88, 48]. Cellular sheaves are combinatorial instances of sheaves that have been introduced in [90] and later rediscovered in [29]. In [90, 29], these sheaves were first defined over regular cell complexes, hence the term “cellular” sheaves. However, as in this work, cellular sheaves are often defined over tamer objects, here graphs. In [51, 35], the authors studied the problem of learning vector cellular sheaves, i.e., cellular sheaves over undirected graphs with finite-dimensional node signals. The works in [53, 52, 42, 83] introduced a novel class of diffusion dynamics on vector cellular sheaves. In [18, 50, 7, 39, 37, 78, 9, 11], neural networks operating on vector cellular sheaves over (undirected, directed, hyper) graphs with finite-dimensional signals are presented, generalizing graph neural networks. We again note however, that all these works implicitly or explicitly restrict to consider either the Levi-Civita or flat connections. Additionally, the work in [18] exploited vector cellular sheaf theory to show that the underlying geometry of the graph gives rise to oversmoothing behavior of GCNs. Also, (vector and general) cellular sheaves recently appeared in causal theory [30], control [49], and telecommunications [47]. Finally, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on vector cellular sheaves from manifold samples.

Appendix B Broader Impact, Future Directions and Limitations

The potential impact of this work extends well beyond the effectiveness of the HilbNet architecture. Our convergence result unifies and extends the graph- and vector-diffusion convergence theories of [14, 93], thereby enabling novel geometric learning systems for genuinely infinite-dimensional manifold-supported and equipped with arbitrary connections. HilbNets are just a first (principled, transferable) instance of such systems, but our result opens several new avenues.

Clustering and Dimensionality Reduction. Classical Laplacian-based methods for clustering [100] and nonlinear dimensionality reduction [13, 28] all rely, either explicitly or implicitly, on the convergence of the graph Laplacian to the Laplace–Beltrami operator. Our Theorem 1 provides the analogous foundation in the Hilbert bundle setting, immediately suggesting sheaf-spectral generalizations of these techniques. For instance, one may define Hilbert sheaf eigenmaps by computing the leading eigensections of $\Delta_{\mathcal{F}_{n}^{t}}$ and using them as coordinates, yielding embeddings that are aware not only of the base manifold geometry but also of the fiber-wise coupling encoded by the connection. This is particularly promising for data such as spatiotemporal fields or distributional signals, where standard spectral methods discard the internal structure of each observation. Similarly, sheaf-spectral clustering would partition data by jointly considering geometric proximity on $\mathcal{M}$ and coherence of the infinite-dimensional signals across fibers, a strictly richer criterion than what scalar graph Laplacians can capture. The finite-rank convergence guarantee of Theorem 2 ensures that such methods can be implemented with truncated signals while remaining provably consistent with the underlying continuous geometry.

Structured Self-Supervised Learning. Self-supervised learning (SSL) has largely been built around objectives that encourage invariance or equivariance with respect to augmentations of the base domain [109, 96, 32, 110]. Our framework suggests a more structured family of SSL methods. Because the Hilbert sheaf Laplacian encodes both spatial geometry and fiber-wise transport, one can design contrastive or non-contrastive objectives that encourage learned representations to be sections of an appropriate bundle, i.e., to satisfy local consistency constraints dictated by the restriction maps. Such objectives would yield representations that are not merely invariant to domain augmentations but are geometrically coherent across fibers, a property that is especially desirable when downstream tasks depend on the relational structure between signals at different manifold points, as in multi-sensor forecasting or multi-agent coordination.

Generalizability Theory and Mechanistic Interpretability for Transformers. Another promising direction concerns the connection between our framework and transformer architectures [98]. In a standard transformer, each token is equipped with a positional encoding, either fixed (e.g., sinusoidal) or learned, that situates it in a continuous geometric space [95]. These positional encodings can be viewed as sampled points on a base manifold $\mathcal{M}$ , with the encoding scheme implicitly defining the metric structure of the domain. The residual-stream representation at each position, or, in the infinite-width or infinite-context limit, the full distribution over possible activations, then lives in a Hilbert space fibered over this base point, so that the collection of representations across positions constitutes a section of a Hilbert bundle over $\mathcal{M}$ . The attention mechanism then defines a data-dependent transport between these fibers. In this view, a self-attention layer is an instance of a single-step diffusion under a learned sheaf Laplacian whose base graph is determined by the sampled positional encodings. This may in some sense be viewed as a more precise incarnation of the recently introduced geodesic hypothesis [55], where the autoregressive output of transformers is modeled by a stochastic diffusion PDE in Euclidean space, rather than the proposed manifold-theoretic treatment. Making this correspondence precise would allow one to import the convergence and transferability machinery of Theorems 1–2 into the transformer setting. We note that there exists a recent line of work that attempts to adapt Laplacian-based GNN generalization and stability results to transformers [81], but due to their fundamental reliance on the convegrence result of [14], must work in a somewhat simplified setting. In conjunction with our convergence result, the generality of the Hilbert sheaf Laplacian is potentially well-suited for establishing extensions of these generalization theorems for a broader class of transformers. On the interpretability side, decomposing attention into a positional-affinity component and a fiber-transport component offers a principled lens through which to study what information each head moves and how it is transformed in transit. One could, for example, measure the holonomy of the learned connection around closed loops of attention to detect whether a head implements a nontrivial geometric transformation. While formalizing these connections requires treatment of the data-dependence of the connection and the interplay between positional and content-based attention, the mathematical infrastructure developed in this work provides a strong starting point.

Limitations. Our theoretical guarantees rest on some assumptions that may not hold exactly in practice, as is usually the case. The convergence results (Theorems 1–2) require the base manifold $\mathcal{M}$ to be closed (compact without boundary), sections to be $C^{3}$ or $C^{4}$ smooth, and samples to be drawn i.i.d. from the uniform distribution. Real-world sensor networks, such as those in our traffic experiments, are neither uniformly sampled nor necessarily supported on compact manifolds, and measured signals are typically noisy rather than smooth. These gaps between theory and practice are absolutely standard in the Laplacian convergence literature: the foundational results of [14], as well as subsequent works on vector diffusion maps [92, 93] and manifold-based learning [28, 13], all assume compact manifolds and uniform or smooth sampling densities, yet are routinely and successfully applied under weaker conditions. Our numerical results confirm that HilbNets likewise remain effective under these standard approximations, consistently outperforming baselines that lack the principled bundle-geometric structure. On the computational side, the network sheaf Laplacian $\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}$ scales quadratically in the product of spatial and fiber dimensions, which may become prohibitive for very large graphs or high-dimensional signal discretizations without further sparsification or approximation strategies. Finally, broader and tailored empirical validation on other infinite-dimensional signal types, such as distributional or functional data on manifolds, remains an important direction for future work.

Appendix C Existing Convolutional Architectures as Actualizations of HilbNets

C.1 Universality of HilbNets

Due to the first-principles approach to the construction of HilbNets, we note that they serve as a sort of universal architecture. That is, several popular variations of convolutional architectures in geometric deep learning, across domains and modalities, can be derived as particular instantiations of HilbNets — even when their construction does not explicitly invoke cellular sheaves. We consider a few concrete examples of this philosophy.

C.1.1 CNNs, GNNs, and Sheaf NNs

Convolutional neural networks (CNNs) and graph neural networks (GNNs) are both often formalized as acting on signals $f:\mathcal{M}\to\mathbb{R}$ . In particular, by the consistency of the discrete Fourier transform, CNNs operating on fixed grid can be viewed as operating on a principled discretization of $\mathcal{M}=[0,1]^{2}$ . We may more generally view the uniform grid on which CNNs act as a particular instantiation of a graph, and thus view CNNs as a special case of GNNs [59], and the relevant convolutional operator as the graph Laplacian. The GNNs can likewise be viewed as principled discretizations of manifold neural networks [101], precisely by the convergence result of [14]. Sheaf neural networks [50, 18] may then be understood as an enrichment that allows for matrix-valued edge weights rather than scalars during convolution. Further, in [9], it was made precise that sheaf neural networks may in particular, be viewed as acting upon tangent bundle signals $s:\mathcal{M}\to T\mathcal{M}$ , via the convergence result of [92]. In particular, the convergence result of [92] applies strictly to tangent bundle setting, providing an explanation as to why existing works of sheaf neural networks either explicitly or implicitly restrict to discretizing the tangent bundle with either flat or Levi-Civita connections [7, 5]. As such, existing sheaf neural network architectures can be typically be recovered as HilbNets under the paradigm that $\mathcal{E}=T\mathcal{M}$ .

C.1.2 Equivariant CNNs and GNNs

The equivariant case of CNNs and GNNs arises when we wish for our architecture to respect some underlying symmetry group $G\curvearrowright\mathcal{M}$ . More generally, there may not exist a global representation of the symmetry, but rather only a local representation. In physics, this is known as a gauge symmetry, and is formalized by considering our signal $f$ as a section of a bundle with connection $(\mathcal{M},\mathcal{E},\nabla)$ , where the group action is then encoded as a symmetry of the connection $\nabla$ . As such, gauge-equivariant CNNs and GNNs constitute perhaps the most general equivariant architectures in the literature (see Weiler et al. [105] for a thorough introduction in the CNN case), and are formulated precisely as convolutions at the level of sections of a frame bundle (although the necessary datum of a connection is often suppressed in the literature). From this perspective, it is natural that gauge-equivariant CNNs and GNNs can be derived as particular cases of cellular sheaf networks (see the Li et al. [68] for a more in-depth exploration of this perspective). Thus, by noting that these architectures can be equivalently reformulated in the language of sheaves, our Theorem 1, as well as the consequent transferability result, can be seen to apply to these architectures. In particular, this may be understood as establishing the theoretical bedrock for the intuitive idea that as the underlying mesh or graph is increasingly refined, these architectures indeed approach continuous operators on sections of the underlying bundle, while maintaining equivariance across scales.

C.1.3 Spatio-Temporal GNNs

A formal treatment of signal processing of graphs whose signals at each node are timeseries is still emerging and is an active area of research, and these filtering techniques then serve as the basis for the development of spatiotemporal graph neural networks (STGNNs). In this literature, it is common to consider convolutional operators built via the joint Laplacian $L_{J}=L_{T}\otimes\operatorname{Id}_{G}+\operatorname{Id}_{T}\otimes L_{G}$ , where $L_{G}$ is the graph-domain Laplacian and $L_{T}$ is the ‘time-domain’ Laplacian [45]. Due to this decomposition, the resulting ‘time’ and ‘space’ filters commute, allowing for the development of both time-and-space or time-then-space STGNNs [73]. Consider now the continuous setting. Convolutions via $\Delta_{\nabla}$ intertwine the spatial and temporal domains, and this is precisely encoded by our parallel transport maps. So suppose our bundle is trivial $\mathcal{E}=\mathcal{M}\times L^{2}(\mathbb{R}^{n})$ with trivial connection $\nabla$ . In this case, our parallel transport maps are simply $P_{\gamma}=Id$ , and the connection Laplacian collapses to the Laplace-Beltrami operator. On product manifolds $\mathcal{M}=\mathcal{M}_{1}\times\mathcal{M}_{2}$ , the Laplace-Beltrami operator decomposes as $\Delta_{\mathcal{M}}=\Delta_{\mathcal{M}_{1}}\otimes\operatorname{Id}_{\mathcal{M}_{2}}+\operatorname{Id}_{\mathcal{M}_{1}}\otimes\Delta_{\mathcal{M}_{2}}$ , implying that heat flow is given by $e^{t\Delta_{M}}=e^{t\Delta_{M_{1}}}\otimes e^{t\Delta_{M_{2}}}$ (see [46] for a formal derivation). Expressing $e^{t{\Delta_{\mathcal{M}}}}$ as an integral operator via the heat kernel, the fact that spatial and temporal filters commute in this case is then simply an application of Fubini’s theorem. As such, we see that the type of filtering commonly considered in STGNNs is recovered precisely as the ‘base case’ of HilbNet, and in particular, our robustness guarantees can also be applied to these STGNN architectures.

Appendix D Practical Implementations of Parallel Transport

As we have established, a key strength of the HilbNets architecture is the ability to encode signal-level geometric priors through the principled incorporation of relevant parallel transport operators. This naturally raises the question as to how these transport operators should be implemented in-practice. We may consider three general classes of use-cases.

Task-Inherent Priors The most theoretically well-grounded case is when knowledge of the geometry of task itself may be utilized to build our parallel transport operators. For instance, suppose the nodes of our base graph represent cameras and the task is multi-view 3D recognition. Then the relevant transport operators should record the rotation $P_{x_{i}\to x_{j}}\in SO(3)$ that aligns views as in [68], resulting in an appropriately equivariant sheaf Laplacian operator. More generally, whenever the data modality lacks a ‘global’ reference frame or coordinate system, then the appropriate alignments between local reference frames precisely gives rise to a connection and the associated parallel transport. For instance, biomedical timeseries analysis often utilizes algorithms based upon the large deformation diffeomorphic metric mapping (LDDMM) [12], a core part of which may be understood as extracting the necessary parallel transport from the data using the first-order ODE definition. We may also consider the tangent bundle networks of [11] in this category, as they use explicit vector-field data from which they may then compute the necessary sheaf transition maps via local PCA. As such, we see that HilbNets may be applied to any of these settings, where the relevant parallel transport would be completely determined by the task itself and thus, can typically be explicitly pre-computed.

Domain-Inherent Priors Alternatively, it is often the case that we may not have access to task-specific priors, but rather to general knowledge of the structure of the signal-domain. For instance, in many domains, our generic stalks may be equipped with the additional structure of a reproducing kernel Hilbert space (RKHS), i.e. $\mathcal{H}_{\kappa}$ . Analogously to the previous case, we may then view parallel transport as operators that that maximize alignment, but now with respect to our kernel. For instance, given a choice of similarity kernel $\kappa$ between timeseries or distributions, then given our initial section data $\{S_{u}\}_{u=1}^{F_{0}}$ , we may define our parallel transport operator matrices via

P_{x_{j}\to x_{i}}^{(d,e_{ij})}:=\arg\max_{\mathbf{T}\in\mathcal{C}}\sum^{F_{0}}_{q=1}\kappa(\mathbf{T}\mathbf{S}_{i,q},\mathbf{S}_{j,q})

(15)

for some suitable class of operators $\mathcal{C}\subseteq O(d)$ , and force the diagonal blocks of the sheaf Laplacian to be the sum of the scalar edge weights given by the kernel. This is exactly the discretized Sheaf Laplacian from (8). In practice, given a choice of similarity kernel on our fibers, we may then either precompute these parallel transport operators using the above optimization objective or learn them end-to-end with the model’s learned filters. In the latter case, (15) is applied as a regularization to the task loss. The special case in which (15) is not employed at all, $\mathcal{C}=O(d)$ , $K=1$ in (10), recovers the sheaf diffusion neural network from [18]. As such, we see that the greater generality of the Hilbert sheaf Laplacian consequently lends itself to more flexible and perhaps more broadly applicable design choices than existing sheaf neural networks.

Appendix E Parallel Transport Parametrizations

This appendix details the finite-dimensional transport parametrizations used to instantiate the network sheaf Laplacian $\Delta_{\mathcal{F}_{n,d}^{t}}$ in the experiments. In particular, this particular instantiation of HilbNets may be considered as a paticular of the end-to-end learning paradigm introduced in D for polynomial filters and a few demonstrative classes of $\mathcal{C}$ and $\kappa$ . The discussion should be read as a continuation of the two-stage discretization in Section 5: after sampling the manifold, we obtain a Hilbert cellular sheaf $\mathcal{F}_{n}^{t}$ ; after sampling or projecting the fibers, we obtain a finite-dimensional network sheaf $\mathcal{F}_{n,d}^{t}$ with $d$ -dimensional stalks.

E.1 From bundle transports to network-sheaf restrictions

Recall that, before signal discretization, the Hilbert cellular sheaf $\mathcal{F}_{n}^{t}$ induced by a sample $\mathcal{X}_{n}=\{x_{1},\dots,x_{n}\}$ assigns the node stalk

\mathcal{F}_{n}^{t}(x_{i})=\mathcal{E}_{x_{i}}

(16)

and, for an edge $e_{ij}\in$ , the edge stalk

\mathcal{F}_{n}^{t}(e_{ij})=\mathcal{E}_{m_{\gamma_{ij}}},

(17)

where $m_{\gamma_{ij}}$ is the midpoint of the chosen geodesic between $x_{i}$ and $x_{j}$ . Its restriction maps are weighted parallel transports of the form

(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}},\qquad k_{ij}^{t}=\exp\left(-\frac{d_{\mathcal{M}}(x_{i},x_{j})^{2}}{4t}\right).

(18)

After fiber discretization, the network sheaf $\mathcal{F}_{n,d}^{t}$ has finite-dimensional stalks, which we identify with $\mathbb{R}^{d}$ after choosing the first $d$ basis elements of the fiber Hilbert space. The corresponding restriction maps are matrices

(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}:\mathbb{R}^{d}\to\mathbb{R}^{d}.

(19)

For the pragmatic parametrizations used in the experiments, it is useful to express the same sheaf Laplacian in node-to-node transport coordinates. Fix an orientation convention for each edge $e_{ij}$ . After identifying the edge stalk with the coordinate system of one endpoint, we write the restrictions as

(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,I_{d},\qquad(\mathcal{F}_{n,d}^{t})_{x_{j}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to x_{i}}^{(d,e_{ij})},

(20)

where

P_{x_{j}\to x_{i}}^{(d,e_{ij})}\in O(d)

(21)

is the finite-dimensional transport carrying the discretized fiber over $x_{j}$ into the discretized fiber over $x_{i}$ along edge $e_{ij}$ . When the transport comes from the continuous Hilbert bundle, this matrix is the finite-dimensional representation of the corresponding parallel transport, after the chosen fiber projection and coordinate identification. When the continuous connection is unknown, $P_{x_{j}\to x_{i}}^{(d,e_{ij})}$ is instead chosen from a transport hypothesis class.

With the shorthand

P_{j\to i}^{e}:=P_{x_{j}\to x_{i}}^{(d,e_{ij})},

(22)

the action of the network sheaf Laplacian on a sampled signal

\mathbf{s}_{n,d}=(\mathbf{s}_{x_{1}},\dots,\mathbf{s}_{x_{n}})\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n}),\qquad\mathbf{s}_{x_{i}}\in\mathbb{R}^{d},

(23)

takes the concrete form

(\Delta_{\mathcal{F}_{n,d}^{t}}\mathbf{s}_{n,d})_{x_{i}}=\sum_{x_{j}\in\mathcal{N}(x_{i})}k_{ij}^{t}\left(\mathbf{s}_{x_{i}}-P_{j\to i}^{e_{ij}}\mathbf{s}_{x_{j}}\right).

(24)

Equivalently, $\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}$ is the block matrix with blocks

(\Delta_{\mathcal{F}_{n,d}^{t}})_{ij}=\begin{cases}\displaystyle\sum_{r:\,e_{ir}\in E}k_{ir}^{t}I_{d},&i=j,\\[11.99998pt] \displaystyle-k_{ij}^{t}P_{j\to i}^{e_{ij}},&i\neq j\text{ and }e_{ij}\in E,\\[5.0pt] 0,&\text{otherwise.}\end{cases}

(25)

This is the same sheaf Laplacian defined in Section 4, specialized to the coordinate convention in (20). The scalar weight $k_{ij}^{t}$ controls how strongly the two sampled base points interact, while the transport matrix $P_{j\to i}^{e_{ij}}$ controls how the two discretized fibers are aligned before their signals are compared.

E.2 Transport hypothesis classes

As mentioned in D, the true connection is often unknown and parallel transport maps must be learned. In the experimental results of this work, we always learn the parallel transport maps end-to-end using the task loss regularized with (15), and we restrict each edgewise transport to hypothesis classes $\mathcal{C}$ such that

P_{j\to i}^{e_{ij}}\in\mathcal{C}\subseteq O(d).

(26)

We use three transport classes in the experiments: frozen identity, free orthogonal transports, and circulant or time-stationary transports.

Frozen identity.

The simplest class is

\mathcal{C}_{\mathrm{id}}=\{I_{d}\}.

(27)

This recovers the usual assumption that neighboring fibers are canonically identified and that no non-trivial alignment is needed. In this case,

P_{j\to i}^{e_{ij}}=I_{d}

(28)

for every edge, and (24) becomes

(\Delta_{\mathcal{F}_{n,d}^{t}}\mathbf{s}_{n,d})_{x_{i}}=\sum_{x_{j}\in\mathcal{N}(x_{i})}k_{ij}^{t}\left(\mathbf{s}_{x_{i}}-\mathbf{s}_{x_{j}}\right).

(29)

Thus, frozen identity reduces the sheaf Laplacian to a standard weighted graph Laplacian applied independently to each fiber coordinate, and, therefore, HilbNets to standard GCNs. It is a useful baseline: any improvement over frozen identity quantifies the value of learning or imposing non-trivial transports.

Free orthogonal transports.

The most expressive finite-dimensional class is

\mathcal{C}_{\mathrm{free}}=O(d),

(30)

or, when the target transports are known to lie in the identity component,

\mathcal{C}_{\mathrm{free}}=SO(d).

(31)

In the experiments, we parameterize free orthogonal transports by products of Householder reflections.

For a nonzero vector $v\in\mathbb{R}^{d}$ , define the Householder reflection

H(v)=I_{d}-2\frac{vv^{\top}}{\|v\|_{2}^{2}}.

(32)

Each $H(v)$ is orthogonal and symmetric:

H(v)^{\top}H(v)=I_{d},\qquad H(v)^{\top}=H(v).

(33)

For each oriented edge $e_{ij}$ , we store $R$ Householder vectors

v_{e_{ij},1},\dots,v_{e_{ij},R}\in\mathbb{R}^{d}

(34)

and define

P_{j\to i}^{e_{ij}}=H(v_{e_{ij},R})\cdots H(v_{e_{ij},1}).

(35)

Therefore $P_{j\to i}^{e_{ij}}$ is exactly orthogonal for every parameter value.

By the Cartan–Dieudonné theorem, every matrix in $O(d)$ can be represented as a product of at most $d$ Householder reflections. In practice, the choice of $R$ is dataset-dependent: in our synthetic experiments we use $R=16$ for $d=m=10$ , a modest over-parametrization that aids optimization in our traffic experiments we use $R=8$ for $d=T=12$ , which parameterizes a strict Householder subset of $O(T)$ rather than the full orthogonal group, and which we find sufficient for the alignment patterns observed in the data. If one fixes exactly $R$ non-degenerate reflections, the determinant parity is fixed:

\det(P_{j\to i}^{e_{ij}})=(-1)^{R}.

(36)

Thus, an even number of reflections parameterizes the identity component $SO(d)$ , while an odd number parameterizes the other component. In the synthetic Gaussian experiment, the ground-truth Levi-Civita transports are obtained continuously from the identity along geodesics and, after Cholesky rescaling, lie in the identity component. Hence an even number of reflections is appropriate. If both connected components of $O(d)$ are needed, one may add a fixed final reflection or a discrete sign component. For numerical stability, the implementation uses

H(v_{e_{ij},r})=I_{d}-2\frac{v_{e_{ij},r}v_{e_{ij},r}^{\top}}{\|v_{e_{ij},r}\|_{2}^{2}+\epsilon},

(37)

with a small $\epsilon>0$ to avoid division by zero at degenerate $v_{e_{ij},r}$ . This recovers the exact Householder reflection $H(q)=I_{d}-2qq^{\top}$ with $q=v_{e_{ij},r}/\|v_{e_{ij},r}\|_{2}$ in the limit $\epsilon\to 0$ for $\|v_{e_{ij},r}\|_{2}>0$ , and yields a matrix orthogonal up to numerical precision.

The free class is useful as an expressivity test. If the target transport belongs to $O(d)$ in the chosen coordinates, then the Householder class can represent it. This is precisely the role it plays in the synthetic statistical-bundle experiment, where Cholesky rescaling converts the intrinsic Wasserstein-unitary Levi-Civita transports into Euclidean-orthogonal matrices. In real-data experiments, the free class serves as a high-capacity transport baseline.

Circulant or time-stationary transports.

For time-series fibers, the discretized fiber dimension is the number of retained time samples, so we write $d=T$ . A natural prior is that inter-fiber transport should commute with time shifts. Let

\mathsf{S}_{T}:\mathbb{R}^{T}\to\mathbb{R}^{T}

(38)

be the cyclic shift operator. A time-stationary transport is one satisfying

P_{j\to i}^{e_{ij}}\mathsf{S}_{T}=\mathsf{S}_{T}P_{j\to i}^{e_{ij}}.

(39)

The commutant of the cyclic shift is the algebra of circulant matrices. Requiring in addition that $P_{j\to i}^{e_{ij}}$ be orthogonal gives the class of orthogonal circulant transports.

Let $F_{T}$ denote the unitary discrete Fourier transform matrix. Then every orthogonal circulant transport has the form

P_{j\to i}^{e_{ij}}=F_{T}^{*}\operatorname{diag}(\lambda_{e_{ij}})F_{T},\qquad|\lambda_{e_{ij},k}|=1.

(40)

For real-valued time-domain signals, the Fourier multipliers must satisfy conjugate symmetry:

\lambda_{e_{ij},T-k}=\overline{\lambda_{e_{ij},k}}.

(41)

We therefore store only the independent positive-frequency phases. Let

m_{T}=\left\lfloor\frac{T-1}{2}\right\rfloor.

(42)

The learnable parameter for edge $e_{ij}$ is

\varphi_{e_{ij}}=(\varphi_{e_{ij},1},\dots,\varphi_{e_{ij},m_{T}})\in\mathbb{R}^{m_{T}}.

(43)

We define

\lambda_{e_{ij},0}=1,\qquad\lambda_{e_{ij},k}=e^{i\varphi_{e_{ij},k}},\qquad\lambda_{e_{ij},T-k}=e^{-i\varphi_{e_{ij},k}},\quad k=1,\dots,m_{T}.

(44)

If $T$ is even, the Nyquist frequency is self-conjugate and is fixed to

\lambda_{e_{ij},T/2}=1

(45)

for the identity-component parametrization. This yields a real orthogonal circulant matrix through (40). Equivalently, $P_{j\to i}^{e_{ij}}$ can be constructed in real arithmetic from its first column. For $r=0,\dots,T-1$ , define

c_{e_{ij}}[r]=\frac{1}{T}\left[1+\mathbb{I}_{\{T\text{ even}\}}(-1)^{r}+2\sum_{k=1}^{m_{T}}\cos\left(\varphi_{e_{ij},k}+\frac{2\pi kr}{T}\right)\right].

(46)

The full circulant matrix is then

(P_{j\to i}^{e_{ij}})_{ab}=c_{e_{ij}}[(a-b)\bmod T],\qquad a,b=0,\dots,T-1.

(47)

This form is convenient for implementation because it avoids explicitly manipulating complex-valued matrices. The circulant class has only

m_{T}=\left\lfloor\frac{T-1}{2}\right\rfloor

(48)

parameters per edge, compared with $d(d-1)/2$ degrees of freedom for a general orthogonal matrix. Each phase $\varphi_{e_{ij},k}$ has a direct interpretation as the phase lag at frequency $k$ between the two endpoint fibers. Thus, the transport may advance or delay oscillatory components across an edge, but it cannot arbitrarily mix frequencies or reshape the waveform. This is the intended inductive bias for spatiotemporal signals such as traffic or sensor time series, where neighboring sensors may observe delayed or phase-shifted versions of related temporal patterns. In the synthetic experiment, the same class is used more abstractly as a structured subgroup of $O(d)$ against which the ground-truth transports can be projected.

E.3 Parameter counts

For a graph $G_{n}=(\mathcal{X}_{n},E)$ with $|E|$ undirected edges and discretized fiber dimension $d$ , the transport parameter counts are summarized in Table 3.

Transport class	Parameters per edge	Interpretation
Frozen identity	$0$	no learned alignment
Free Householder	$Rd$	product of $R$ reflections in $O(d)$
Full circulant	$\lfloor(d-1)/2\rfloor$	one phase per positive frequency

Table 3: Parameter counts for the finite-dimensional transport classes used in

\mathcal{F}_{n,d}^{t}

Thus, the free class is maximally expressive but parameter-heavy, while the circulant classes encode a strong time-stationary prior and scale linearly with the number of frequencies or bands.

Appendix F Additional Experimental Details

F.1 Synthetic experiments: the statistical bundle over centered Gaussians

F.1.1 The bundle

The base manifold is $\mathcal{M}=\mathrm{Sym}^{++}(p)$ , the open cone of $p\times p$ symmetric positive-definite matrices. Each $\Sigma\in\mathcal{M}$ parameterizes a centered Gaussian $\mathcal{N}(0,\Sigma)$ on $\mathbb{R}^{p}$ with density $\rho_{\Sigma}$ . We equip $\mathcal{M}$ with the Otto-Wasserstein metric, namely the Riemannian metric induced on $\mathrm{Sym}^{++}(p)$ by the optimal-transport distance $W_{2}$ between centered Gaussian measures.

Concretely, the tangent space is naturally identified with symmetric matrices,

T_{\Sigma}\mathcal{M}\cong\mathrm{Sym}(p),

(49)

and the Otto-Wasserstein inner product between $U,V\in\mathrm{Sym}(p)$ is

W_{\Sigma}(U,V)=\tfrac{1}{2}\,\mathrm{Tr}\bigl(L_{\Sigma}[U]\,V\bigr),\qquad L_{\Sigma}[U]\Sigma+\Sigma L_{\Sigma}[U]=U,

(50)

where $L_{\Sigma}[U]$ is the unique symmetric solution of the Lyapunov equation.

Above each $\Sigma$ , the ambient Hilbert fiber is the vector-field space

\mathcal{H}_{\Sigma}:=L^{2}(\rho_{\Sigma};\mathbb{R}^{p}),

(51)

equipped with the inner product

\langle a,b\rangle_{\mathcal{H}_{\Sigma}}=\int_{\mathbb{R}^{p}}a(x)^{\top}b(x)\rho_{\Sigma}(x)\,dx.

(52)

This fiber is genuinely infinite-dimensional. The finite-rank fiber used in the synthetic experiments is the Otto-velocity image of covariance perturbations:

\mathcal{E}_{\Sigma}:=\bigl\{v_{V}(x)=L_{\Sigma}[V]x:V\in\mathrm{Sym}(p)\bigr\}\subset L^{2}(\rho_{\Sigma};\mathbb{R}^{p}).

(53)

Thus, the computational fiber $\mathcal{E}_{\Sigma}$ is a finite-dimensional statistical subspace of the ambient Hilbert fiber. The map $V\mapsto v_{V}$ is an isometry between $\mathrm{Sym}(p)$ with the Otto-Wasserstein metric and $\mathcal{E}_{\Sigma}$ with the $L^{2}(\rho_{\Sigma};\mathbb{R}^{p})$ inner product. Indeed, for $U,V\in\mathrm{Sym}(p)$ ,

\langle v_{U},v_{V}\rangle_{\mathcal{H}_{\Sigma}}=\mathbb{E}_{x\sim\mathcal{N}(0,\Sigma)}\bigl[x^{\top}L_{\Sigma}[U]L_{\Sigma}[V]x\bigr]=\mathrm{Tr}\bigl(L_{\Sigma}[U]L_{\Sigma}[V]\Sigma\bigr).

(54)

Using $V=L_{\Sigma}[V]\Sigma+\Sigma L_{\Sigma}[V]$ and the fact that $L_{\Sigma}[U]$ , $L_{\Sigma}[V]$ , and $\Sigma$ are symmetric, this equals

\tfrac{1}{2}\mathrm{Tr}\bigl(L_{\Sigma}[U]V\bigr)=W_{\Sigma}(U,V).

(55)

Therefore,

\dim\mathcal{E}_{\Sigma}=\dim\mathrm{Sym}(p)=d=p(p+1)/2.

(56)

Since the fibers $\mathcal{E}_{\Sigma}$ are already $m$ -dimensional, the fiber discretization of Proposition 1 is exact with $d=m$ . Throughout this appendix, we therefore write $d=m$ and use the network sheaf notation $\mathcal{F}_{n,d}^{t}$ and Laplacian $\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}$ from Section 5.

This construction is useful because it gives a faithful but tractable proxy for the Hilbert-bundle settings that motivate HilbNets. It is faithful in the sense that the ambient fibers $L^{2}(\rho_{\Sigma};\mathbb{R}^{p})$ are infinite-dimensional vector-field Hilbert spaces, and the Levi-Civita connection of $(\mathrm{Sym}^{++}(p),W_{\Sigma})$ yields non-trivial, metric-compatible parallel transports. It is tractable because the Otto-velocity map selects a finite-rank statistical sub-bundle on which the metric, parallel-transport ODE, and projection of ground-truth transports onto restricted transport classes admit closed-form numerical evaluation. Thus, the experiments probe the finite-rank computational slice used by the implementation.

F.1.2 Levi-Civita connection and closed-form parallel transport

The Levi-Civita connection on $(\mathrm{Sym}^{++}(p),W_{\Sigma})$ is the canonical metric-compatible torsion-free connection associated with the Otto-Wasserstein metric. In covariance coordinates, its Christoffel symbol is

\Gamma_{\Sigma}(U,V)=\tfrac{1}{2}\bigl(UL_{\Sigma}[V]+VL_{\Sigma}[U]\bigr)\in\mathrm{Sym}(p),

(57)

which is symmetric in $(U,V)$ , as required for a torsion-free connection.

Let $\Sigma_{t}$ denote the Wasserstein geodesic from $\Sigma_{0}$ to $\Sigma_{1}$ . Parallel transport of a tangent vector $V(t)\in\mathrm{Sym}(p)$ along $\Sigma_{t}$ is governed by

\dot{V}(t)=-\Gamma_{\Sigma_{t}}\bigl(\dot{\Sigma}_{t},V(t)\bigr),\qquad V(0)=V_{0}.

(58)

We solve this ODE numerically by Euler integration with $50$ steps. The resulting linear map on the finite-rank fiber is the ground-truth Levi-Civita transport

P^{LC}_{x_{i}\to x_{j}}:\mathrm{Sym}(p)\to\mathrm{Sym}(p).

(59)

In the notation of Def, 3, the restriction maps of the induced sheaf $\mathcal{F}_{n,d}^{t}$ use the midpoint transports $P^{LC}_{x_{i}\to m_{ij}}$ , which play the role of the discretized parallel transport $P_{x_{i}\to m_{ij}}^{(d)}$ with $d=m$ . These midpoint transports define the restriction maps used in the spectral-stability experiments. The transport-recovery experiments instead regress against the Cholesky-rescaled node-to-node transport $\tilde{P}^{LC}_{x_{j}\to x_{i}}$ , which adopts the orientation convention of Appendix D.

Because the connection is metric-compatible, $P^{LC}_{x_{i}\to x_{j}}$ is unitary with respect to the Otto-Wasserstein inner products on the source and target fibers:

W_{x_{j}}\bigl(P^{LC}_{x_{i}\to x_{j}}U,P^{LC}_{x_{i}\to x_{j}}V\bigr)=W_{x_{i}}(U,V).

(60)

We verify this numerically to within $0.5\%$ using $200$ Euler steps. This Wasserstein-unitarity is the geometric invariant that justifies comparing the ground-truth transports to orthogonal parametrizations after metric rescaling.

F.1.3 Cholesky rescaling

The free- $O(m)$ transport class used in the implementation is Euclidean-orthogonal in vectorized fiber coordinates (cf. the hypothesis classes in Appendix D with $d=m$ ). However, the intrinsic fiber metric is $W_{\Sigma}$ , not the raw Frobenius metric on $\mathrm{Sym}(p)$ . Therefore, $P^{LC}_{x_{i}\to x_{j}}$ is not generally orthogonal in raw coordinates.

Let $G_{\Sigma}$ be the Gram matrix of $W_{\Sigma}$ in a fixed basis of $\mathrm{Sym}(p)$ . We factor

G_{\Sigma}=R_{\Sigma}^{\top}R_{\Sigma}

(61)

by Cholesky decomposition and represent a fiber coordinate vector $u$ in the rescaled frame as

\tilde{u}=R_{\Sigma}u.

(62)

In this frame, the rescaled Levi-Civita transport is

\tilde{P}^{LC}_{x_{i}\to x_{j}}=R_{x_{j}}P^{LC}_{x_{i}\to x_{j}}R_{x_{i}}^{-1}.

(63)

By Wasserstein-unitarity, it satisfies

\bigl(\tilde{P}^{LC}_{x_{i}\to x_{j}}\bigr)^{\top}\tilde{P}^{LC}_{x_{i}\to x_{j}}=I_{m}.

(64)

Hence,

\tilde{P}^{LC}_{x_{i}\to x_{j}}\in O(m).

(65)

This is the coordinate system used in the transport-recovery experiments. In these coordinates, the free- $O(m)$ Householder class described in Appendix D contains the ground-truth transports. The spectral-stability metrics are computed from the assembled sheaf Laplacian $\Delta_{\mathcal{F}_{n,d}^{t}}$ and are invariant to this coordinate choice up to similarity transformation.

F.1.4 Sample construction

We draw samples

\Sigma_{i}=R_{i}D_{i}R_{i}^{\top},\qquad i=1,\dots,n,

(66)

where $R_{i}$ is a Haar-random $p\times p$ orthogonal matrix, obtained by QR decomposition of a standard-normal matrix, and $D_{i}$ is diagonal with log-uniform spectrum on $[\log 0.5,\log 2.0]$ . Equivalently, the eigenvalues of $\Sigma_{i}$ lie in $[0.5,2.0]$ on a log-uniform scale.

Although $\mathrm{Sym}^{++}(p)$ is non-compact, this procedure samples a bounded subset of it. This is appropriate for the finite deployment-regime stability tests reported here, but it is not the normalized-volume sampling assumption used in the asymptotic convergence theorem.

We build a $k$ NN graph $G_{n}=(\mathcal{X}_{n},E)$ with $k=8$ under the Gaussian Wasserstein distance $W_{2}(\Sigma_{i},\Sigma_{j})$ and assign Gaussian-kernel weights

k_{ij}^{t}=\exp\left(-\frac{W_{2}(\Sigma_{i},\Sigma_{j})^{2}}{4t}\right),\qquad t=0.5.

(67)

The induced network sheaf $\mathcal{F}_{n,d}^{t}$ has per-edge restriction maps

(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P^{LC}_{x_{i}\to m_{ij}},

(68)

as in Def, 3, where $m_{ij}$ is the Wasserstein-geodesic midpoint between $\Sigma_{i}$ and $\Sigma_{j}$ .

F.1.5 Spectral stability under sampling density increase

For each Gaussian dimension $p$ , sample size $n\in\{50,100,200,400,800\}$ , and random seed, we sample $\mathcal{X}_{n}\subset\mathrm{Sym}^{++}(p)$ , build $\mathcal{F}_{n,d}^{t}$ using the closed-form Levi-Civita transports, and assemble the sheaf Laplacian

\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd}

(69)

as a sparse block matrix. Since the sheaf Laplacian has size $n_{\max}m\times n_{\max}m$ and we require a dense eigendecomposition, we choose $n_{\max}$ so that $n_{\max}\cdot d\approx 10^{4}$ remains tractable on a single CPU node, yielding $n_{\max}\in\{4000,2000,1000\}$ for $p\in\{2,3,4\}$ (i.e., $d\in\{3,6,10\}$ ), respectively.

Let

\lambda_{1}^{(n)}\leq\dots\leq\lambda_{k}^{(n)}

(70)

denote the bottom- $k$ eigenvalues of $\Delta_{\mathcal{F}_{n,d}^{t}}$ , with $k=32$ , and define $\lambda_{i}^{(n_{\max})}$ analogously for the reference operator $\Delta_{\mathcal{F}_{n_{\max},m}^{t}}$ . We measure both the aggregate $\ell_{2}$ and worst-case relative spectral discrepancy of the bottom- $32$ eigenvalues of $\Delta_{\mathcal{F}_{n,d}^{t}}$ against a high-resolution reference, sweeping $p\in\{2,3,4\}$ and $n\in\{50,100,200,400,800\}$ , and averaging over 5 sampling realizations. Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.

Moreover,

Aggregate discrepancy.

Fig. 3 (Left) reports the low-frequency spectral $\ell_{2}$ discrepancy

\mathrm{spec\text{-}L_{2}}(n)=\frac{1}{k}\left(\sum_{i=1}^{k}\bigl(\lambda_{i}^{(n)}-\lambda_{i}^{(n_{\max})}\bigr)^{2}\right)^{1/2}.

(71)

We average across three seeds and report $\bar{x}\pm s$ as the shaded band.

Worst-case discrepancy.

Fig. 3 (Right) reports the relative max error over the bottom- $k$ eigenvalues:

\mathrm{spec\text{-}rel\text{-}max}(n)=\frac{\max_{1\leq i\leq k}\left|\lambda_{i}^{(n)}-\lambda_{i}^{(n_{\max})}\right|}{\lambda_{k}^{(n_{\max})}}.

(72)

This complements the aggregate metric by capturing worst-case low-frequency spectral error.

Overall, Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.

We train three transport parametrizations from Appendix D, free $O(d)$ (Householder), circulant, and frozen identity, to recover the Levi-Civita transports in Cholesky-rescaled coordinates by minimizing the per-edge transport-MSE loss

\mathcal{L}=\frac{1}{|E|}\sum_{e_{ij}\in E}\mathbb{E}_{V\sim\mathcal{N}(0,I_{d})}\left\|P_{j\to i}^{e_{ij}}V-\tilde{P}^{LC}_{x_{j}\to x_{i}}V\right\|_{2}^{2}.

(73)

Here $P_{j\to i}^{e_{ij}}\in O(d)$ is the rescaled transport produced by the model for edge $e_{ij}$ , $\tilde{P}^{LC}_{x_{j}\to x_{i}}$ is the ground-truth Levi-Civita transport from $x_{j}$ to $x_{i}$ in Cholesky-rescaled coordinates, and $V$ is a fresh isotropic Gaussian test vector.

By the trace identity

\mathbb{E}_{V\sim\mathcal{N}(0,I_{d})}\|AV\|_{2}^{2}=\|A\|_{F}^{2},

(74)

the population loss equals the mean per-edge squared Frobenius distance

\mathcal{L}^{*}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|P_{j\to i}^{e_{ij}}-\tilde{P}^{LC}_{x_{j}\to x_{i}}\right\|_{F}^{2}.

(75)

Free $O(d)$ .

The free transport is parameterized by products of Householder reflections, as detailed in Appendix D. Since $\tilde{P}^{LC}_{x_{j}\to x_{i}}\in O(d)$ in the Cholesky-rescaled frame, the free- $O(d)$ class contains the target transport and the minimum population loss is zero. Empirically, we observe $\mathcal{L}^{*}\approx 1.6\cdot 10^{-7}$ at convergence, confirming recovery to numerical precision.

Restricted classes and analytical projections.

For each restricted transport hypothesis class $\mathcal{C}\subseteq O(d)$ , the population loss minimum is the mean Frobenius distance from each ground-truth transport $\tilde{P}^{LC}_{x_{j}\to x_{i}}$ to its best approximation inside $\mathcal{C}$ :

\mathcal{L}^{*}_{\mathcal{C}}=\frac{1}{|E|}\sum_{e_{ij}\in E}\min_{T\in\mathcal{C}}\left\|T-\tilde{P}^{LC}_{x_{j}\to x_{i}}\right\|_{F}^{2}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|\tilde{P}^{LC}_{x_{j}\to x_{i}}-\mathrm{proj}_{\mathcal{C}}\bigl(\tilde{P}^{LC}_{x_{j}\to x_{i}}\bigr)\right\|_{F}^{2}.

(76)

This is the Theory column in Table 1.

Frozen identity. For $\mathcal{C}_{\mathrm{id}}=\{I_{d}\}$ , the projection is trivial: $\mathrm{proj}_{\{I\}}(\tilde{P}^{LC}_{x_{j}\to x_{i}})=I_{d}$ , so

\mathcal{L}^{*}_{\mathrm{frozen}}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|\tilde{P}^{LC}_{x_{j}\to x_{i}}-I_{d}\right\|_{F}^{2}.

(77)

Circulant. For the circulant class (Appendix D with $d=m$ ), let $F_{d}$ be the $d\times d$ DFT matrix and define

\mathcal{C}_{\mathrm{circ}}=\left\{F_{d}^{*}\mathrm{diag}(e^{i\varphi_{1}},\dots,e^{i\varphi_{d}})F_{m}:\varphi_{k}\in\mathbb{R}\right\}.

(78)

The Frobenius-best projection of any $T\in O(d)$ onto this class is the diagonal-phase Procrustes solution

\mathrm{proj}_{\mathrm{circ}}(T)=F_{m}^{*}\mathrm{diag}(e^{i\hat{\varphi}_{1}},\dots,e^{i\hat{\varphi}_{d}})F_{d},\qquad\hat{\varphi}_{k}=\arg\bigl((F_{d}TF_{d}^{*})_{kk}\bigr).

(79)

The zero-frequency phase is pinned to zero to preserve the constant mode, and the self-conjugate Nyquist frequency, when present, is handled according to the real-valued convention of Appendix D. In the synthetic experiment the circulant class is used as a structured subgroup of $O(d)$ for testing transport-class projection, rather than as a time-series prior.

Empirical vs. theoretical plateaus.

In Table 1, the Empirical column is the lowest training loss observed over the $5000$ -epoch budget, while the Theory column is $\mathcal{L}^{*}_{\mathcal{C}}$ computed on the same edge set used during training. Both columns report means $\pm$ standard deviations across three seeds. The empirical and theoretical plateaus track each other to within $3\%$ for the circulant class and within $2.3\%$ for frozen identity, with closer agreement (within $1\%$ ) at large $n$ . This confirms that restricting the transport class to $\mathcal{C}\subseteq O(d)$ constrains the per-edge restriction maps of $\Delta_{\mathcal{F}_{n,d}^{t}}$ in a quantitatively predictable way. We do not claim that such an arbitrary edgewise subgroup constraint automatically lifts to a smooth global connection class on the continuous bundle.

F.1.6 Hyperparameters

See Table 4.

	Spectral stability	Transport recovery
Gaussian dimension $p$	$\{2,3,4\}$ ( $d\in\{3,6,10\}$ )	$4$ ( $d{=}10$ )
sample-size grid $n$	$\{50,100,200,400,800\}$	$\{16,32,64,128,256\}$
reference $n_{\max}$	$\{4000,2000,1000\}$ at $p\in\{2,3,4\}$	—
graph	kNN, $k{=}8$ , $W_{2}$ distance	kNN, $k{=}8$ , $W_{2}$ distance
top- $k$ eigenvalues	$32$	—
seeds	$3$	$3$
epochs / patience	—	$5000$ / $600$
optimizer	—	Adam, lr $5\!\cdot\!10^{-3}$ , batch $256$
Householder reflections	—	$R{=}16$
Euler steps for $P^{LC}$	$50$	$50$

Table 4: Synthetic experiments: hyperparameters for spectral stability and transport recovery.

F.2 Traffic forecasting: experimental details

F.2.1 Datasets

We use two standard traffic-speed benchmarks from [69].

METR-LA. $|\mathcal{X}_{n}|=207$ loop-detector sensors on the Los Angeles highway network, recording average traffic speed at $5$ -minute intervals. The spatial graph follows the DCRNN convention: edge weights $W_{ij}=\exp(-d_{\mathcal{M}}(x_{i},x_{j})^{2}/\sigma^{2})$ with $\sigma$ set to the standard deviation of the pairwise road-network distances are thresholded to retain $W_{ij}\geq\kappa$ (with $\kappa=0.1$ ) and symmetrized via $W\leftarrow\max(W,W^{\top})$ .

PEMS-BAY. $|\mathcal{X}_{n}|=325$ sensors in the San Francisco Bay Area, with the same temporal resolution and graph construction.

For both datasets, we use $T=12$ observed time steps as input and forecast at horizons $h\in\{3,6,12\}$ . Train/validation/test splits follow the standard $70/10/20$ chronological partition of [69].

F.2.2 Task formulation

At each forecasting instance, the input signal is

\mathbf{s}_{n,T}=({\mathbf{s}}_{x_{1}},\dots,{\mathbf{s}}_{x_{n}})\in C^{0}(\mathcal{F}_{n,T}^{t};G_{n}),\qquad{\mathbf{s}}_{x_{i}}\in\mathbb{R}^{T},

(80)

where $n=|\mathcal{X}_{n}|$ and $d=T$ , so the network sheaf $\mathcal{F}_{n,T}^{t}$ has $T$ -dimensional stalks and Laplacian $\Delta_{\mathcal{F}_{n,T}^{t}}\in\mathbb{R}^{nT\times nT}$ . The goal is to predict future speed vectors $\mathbf{y}^{(h)}\in\mathbb{R}^{|\mathcal{X}_{n}|}$ at each horizon $h$ . The prediction loss is the mean absolute forecasting error

\mathcal{L}_{\mathrm{pred}}=\frac{1}{|\mathcal{D}|}\sum_{(\mathbf{s}_{n,T},\mathbf{y})\in\mathcal{D}}\sum_{h\in\{3,6,12\}}\left\|\widehat{\mathbf{y}}^{(h)}(\mathbf{s}_{n,T})-\mathbf{y}^{(h)}\right\|_{1}.

(81)

For HilbNet variants with learned transports, we add the kernel regularizer of Appendix D with weight $\lambda$ , giving the full training objective $\mathcal{L}=\mathcal{L}_{\mathrm{pred}}+\lambda\,\mathcal{L}_{\mathrm{kernel-reg}}$ .

F.2.3 Model variants and baselines

All HilbNet variants are discretized HilbNets (Def, 5) with polynomial filters of order $K$ and the same per-node linear readout (DCRNN convention) mapping sheaf-filtered features to per-node horizon predictions. The only architectural difference is the admissible class of edgewise transports $P_{j\to i}^{e_{ij}}\in\mathcal{C}\subseteq O(T)$ , as described in Appendix D with $d=T$ . The transport hypothesis classes are the same of the previous section, briefly summarized and contextualized below.

Frozen identity. $\mathcal{C}_{\mathrm{id}}=\{I_{T}\}$ . Neighboring sensors exchange time windows without temporal alignment. The sheaf Laplacian reduces to a standard weighted graph Laplacian applied independently to each temporal coordinate, recovering a graph convolutional network.

Circulant. Each transport is a real orthogonal circulant matrix parameterized by $\lfloor(T{-}1)/2\rfloor$ frequency-wise phases per edge. This encodes a time-stationary prior: the transport can advance or delay oscillatory components across an edge but cannot arbitrarily mix temporal coordinates. This is a natural inductive bias for traffic data, where congestion patterns propagate through the road network with local delays and phase shifts.

Free $O(T)$ . Each transport is parameterized by a product of $R{=}8$ Householder reflections, yielding an orthogonal matrix in a strict Householder-defined subset of $O(T)$ (see Appendix D). This is the most expressive transport class but uses $\sim\!10{\times}$ more parameters in total than the circulant variant (e.g., $119{,}036$ vs $11{,}656$ on METR-LA; per edge the ratio is $\sim\!20{\times}$ , since circulant uses only $\lfloor(T{-}1)/2\rfloor=5$ phases per edge at $T{=}12$ ).

We also compare against two non-transport baselines. A fiber-only MLP processes each sensor’s time window independently, ignoring graph structure. A spatiotemporal graph baseline applies standard graph convolution to the temporally-augmented node features but does not learn sheaf transports. Finally, we include two external baselines from the literature: FC-LSTM [69] and STAEformer [71], reported as in the cited papers.

F.2.4 Detailed results

Table 2 reports MAE, RMSE, and MAPE at horizons $3$ , $6$ , and $12$ (mean $\pm$ std over five seeds for our experiments).

Value of graph structure.

On both datasets, the frozen-identity HilbNet improves over the fiber-only MLP baseline, confirming that sheaf diffusion over the spatial graph is beneficial even without non-trivial transports.

Value of learned transports.

Both the free and circulant variants consistently outperform frozen identity at all horizons on both datasets. This confirms that the sheaf structure helpw beyond the usual graph structure.

Free vs. circulant.

On METR-LA, the free- $O(T)$ model achieves the best absolute accuracy at all horizons (e.g., MAE $3.938$ vs. $4.059$ for circulant at horizon $12$ ), as expected from its larger hypothesis class. On PEMS-BAY, the two variants are nearly tied: free wins MAE and MAPE at horizon $12$ by $\sim\!0.03$ mph (a $2$ – $4\sigma$ effect over five seeds), circulant wins MAE/MAPE at horizon $3$ , and RMSE is statistically indistinguishable at horizons $6$ and $12$ . In both cases, the circulant model uses roughly one tenth of the transport parameters (e.g., $11{,}656$ vs. $119{,}036$ on METR-LA), making it the most parameter-efficient HilbNet variant. This supports the central pragmatic-transport message: a structured transport class encoding a physically motivated alignment prior can recover most of the benefit of unconstrained learned transports with substantially fewer degrees of freedom.

Comparison with external baselines.

Our HilbNet variants are lightweight models designed to test the value of Hilbert-sheaf structure, not to compete with large-scale spatiotemporal transformers. The external baselines (FC-LSTM, STAEformer) are included for reference and use substantially more parameters and architectural components. Nevertheless, the circulant and free HilbNets outperform FC-LSTM at all horizons on both datasets while using far fewer parameters.

F.2.5 Hyperparameters

See Table 5. All experiments are run on a single H200 GPU. Hyperparameters are chosen with a sweep. All presented variants are computed using the same codebase, and we made sure they differ only in their transport parametrization.

	METR-LA	PEMS-BAY
sensors $\|\mathcal{X}_{n}\|$	$207$	$325$
input window $T$	$12$	$12$
horizons $h$	$\{3,6,12\}$	$\{3,6,12\}$
input feature dim	$2$ (speed + time-of-day)	$2$ (speed + time-of-day)
graph	thresh. kernel $\exp(-d^{2}/\sigma^{2})$ , $\kappa{=}0.1$	thresh. kernel $\exp(-d^{2}/\sigma^{2})$ , $\kappa{=}0.1$
sheaf-conv layers	$L{=}2$ , channel widths $[2,16,32]$	$L{=}2$ , channel widths $[2,16,32]$
polynomial order $K$ per layer	$K=[2,2]$ (HilbNet), $K=[1,1]$ (MLP fiber baseline)	$K=[2,2]$ (HilbNet), $K=[1,1]$ (MLP fiber baseline)
Householder reflections	$8$	$8$
readout	per-node linear ( $T_{\mathrm{in}}\!\cdot\!F_{\mathrm{last}}\to T_{\mathrm{out}}$ )	per-node linear ( $T_{\mathrm{in}}\!\cdot\!F_{\mathrm{last}}\to T_{\mathrm{out}}$ )
epochs / patience	$150$ / $20$	$150$ / $20$
batch size	$32$	$32$
optimizer	Adam, $\beta{=}(0.9,0.999)$	Adam, $\beta{=}(0.9,0.999)$
learning rate	$5\!\cdot\!10^{-3}$ (HilbNet); $1\!\cdot\!10^{-3}$ (STGNN baseline)	$5\!\cdot\!10^{-3}$ (HilbNet); $1\!\cdot\!10^{-3}$ (STGNN baseline)
LR schedule	cosine annealing, $\eta_{\min}{=}10^{-6}$	cosine annealing, $\eta_{\min}{=}10^{-6}$
weight decay / dropout	$0$ / $0$	$0$ / $0$
gradient clipping	$\\|g\\|_{2}{\leq}5$ (DCRNN convention)	$\\|g\\|_{2}{\leq}5$ (DCRNN convention)
kernel regularizer $\lambda$	$0.01$	$0.01$
seeds	$5$	$5$

Table 5: Traffic forecasting hyperparameters.

Appendix G Mathematical Background

G.1 Hilbert Bundles

In this section, we provide relevant background on the theory of Hilbert bundles. In particular, we define the notions of Banach and Hilbert manifolds, as well as introduce the appropriate notions of connection, parallel transport, and heat flow for bundles in this setting.

G.1.1 Banach and Hilbert manifolds

To study heat kernels for smooth Hilbert bundles, we must examine manifolds modeled on generic Banach spaces. We will assume all Banach spaces and Hilbert spaces are defined over the field of real numbers $\mathbb{R}$ , unless otherwise stated.

Definition 7.

A second-countable topological space $\mathcal{M}$ is a topological Banach manifold if there is a Banach space $\mathcal{V}$ and an atlas $\{(U_{i}\>,\>\phi_{i}:U_{i}\rightarrow\mathcal{V})\}_{i\in I}$ such that the following conditions hold:

1.

each $U_{i}$ is an open subset of $\mathcal{M}$ ;
2.

each $\phi_{i}:U_{i}\rightarrow\mathcal{V}$ is a homeomorphism onto an open subset of $\mathcal{V}$ ;
3.

for all $i,j$ , $\phi_{i}(U_{i}\cap U_{j})$ is an open subset of $\mathcal{V}$ ;
4.

the transition map $\phi_{j}\phi_{i}^{-1}:\phi_{i}(U_{i}\cap U_{j})\rightarrow\phi_{j}(U_{i}\cap U_{j})$ is a homeomorphism.

When the Banach space $\mathcal{V}$ is specified, we say that $\mathcal{M}$ is an $\mathcal{V}$ -manifold, or a Banach manifold modeled on $\mathcal{V}$ . If each map $\phi_{i}$ and transition map $\phi_{j}\phi_{i}^{-1}$ is $k$ -times Fréchet differentiable, we say that $\mathcal{M}$ is a $C^{k}$ -Banach manifold. If these maps are smooth i.e. $C^{\infty}$ , we say that $\mathcal{M}$ is a smooth Banach manifold.

Definition 8.

A topological (resp. $C^{k}$ / smooth) Banach manifold $\mathcal{M}$ is a topological (resp. $C^{k}$ / smooth) Hilbert manifold if it can be modeled on a Banach space $\mathcal{V}$ which admits the structure of a Hilbert space.

Remark 1.

We make a few observations about this definition.

1.

Since every $n$ -dimensional real Banach space is isomorphic to $\mathbb{R}^{n}$ , a finite dimensional Banach manifold is exactly a real manifold in the usual sense.
2.

Like ordinary manifolds, we require Banach manifolds to be second countable, and hence to have a countable dense subset. It follows that if $\mathcal{M}$ is a manifold modeled on a Banach space $\mathcal{V}$ , then $\mathcal{V}$ itself must be separable. This condition could be be removed, but it will generally make our lives easier.
3.

The definition of a Hilbert manifold does not directly require the transition maps $\tau_{ij}:=\phi_{j}\circ\phi_{i}^{-1}:\mathcal{V}\to\mathcal{V}$ to respect inner product structure on the modeling Hilbert space $\mathcal{V}$ . Hence, it is often better to think of a Hilbert manifold as a special case of a Banach manifold, instead of as a manifold that respects the Hilbert space structure per se.

The usual differential geometric constructions on manifolds extend naturally to Hilbert and Banach manifolds. For example, tangent spaces generalize naturally. Given a $C^{k}$ -Banach manifold $\mathcal{M}$ with $k\geq 1$ , for each $x\in\mathcal{M}$ , one may form the tangent space $T_{x}\mathcal{M}$ at a point $x\in\mathcal{M}$ as equivalence classes of triples $(U,\phi,v)$ of a chart $\phi:U\to\mathcal{V}$ and a vector $v\in\mathcal{V}$ , under the relation:

(U_{1},\phi,v)\sim(U_{2},\psi,w)\iff(D_{\phi(x)}(\psi\phi)^{-1})(v)=w.

Such equivalence classes are easily seen to form a real vector space isomorphic to $\mathcal{V}$ .

G.1.2 Smooth bundles

Definition 9 (Smooth Banach and Hilbert bundles).

Let $\mathcal{M}$ be a smooth finite–dimensional manifold and let $\mathcal{V}$ be a fixed separable Banach space. A smooth Banach bundle with model space $\mathcal{V}$ consists of a smooth Banach manifold $\mathcal{E}$ equipped with a smooth surjective submersion

\pi\colon\mathcal{E}\longrightarrow\mathcal{M}\,,

that satisfies the following conditions.

1.

Local triviality. For every $p\in\mathcal{M}$ there exists an open neighborhood $U\subset\mathcal{M}$ and a diffeomorphism

$\phi_{U}\colon\pi^{-1}(U)\;\xrightarrow{\;\cong\;}\;U\times\mathcal{V}$

satisfying $\pi=\text{proj}_{1}\!\circ\phi_{U}$ , where $\text{proj}_{1}:U\times\mathcal{V}\to U$ is the canonical projection, and such that, for each $q\in U$ , the restriction $\phi_{U}|_{\mathcal{E}_{q}}\colon\mathcal{E}_{q}\to\{q\}\times\mathcal{V}$ is a bounded linear isomorphism. We call the pair $(U,\phi_{U})$ a trivializing chart.
2.

Smooth transition functions. Whenever $(U,\phi_{U})$ and $(V,\phi_{V})$ are trivializing charts, the transition map

$\tau_{UV}(q,-)\;:=\;\phi_{V}\circ\phi_{U}^{-1}\big(q,-\big)\;\colon\;\mathcal{V}\longrightarrow\mathcal{V},\qquad q\in U\cap V,$

is a bounded isomorphism and depends smoothly on $q$ ; that is, $\tau_{UV}\colon U\cap V\to\mathrm{GL}(\mathcal{V})$ is a smooth map, where $\mathrm{GL}(\mathcal{V})$ denotes the Banach–Lie group of bounded invertible operators on $\mathcal{V}$ with the operator‐norm topology.
3.

Smooth norm. There is a smooth map $N:\mathcal{E}\to\mathbb{R}$ such that the trivializing charts $(U,\phi_{U})$ can be chosen with the additional property that for each $x\in\mathcal{E}_{p}$ ,

$N(x)=\left\|\phi_{U}\big\rvert_{\mathcal{E}_{p}}(x)\right\|_{\mathcal{V}}\,.$

Smooth fiberwise operations. The fiberwise addition and scalar‐multiplication maps

+\;\colon\;\mathcal{E}\times_{\mathcal{M}}\mathcal{E}\longrightarrow\mathcal{E},\qquad\cdot\;\colon\;\mathbb{R}\times\mathcal{E}\longrightarrow\mathcal{E},

are smooth Banach‐manifold maps.

When the Banach space $\mathcal{V}$ is a separable Hilbert space, we say $\pi:\mathcal{E}\to B$ is a Hilbert bundle. We denote the inner product on the fiber $\mathcal{E}_{p}$ by $\langle-,-\rangle_{p}$ .

Remark 2.

We make a few remarks about the definition of a Hilbert bundle above.

1.

For convenience, we restrict our attention to Hilbert bundles over closed finite-dimensional manifolds with separable fibers. None of these restrictions are essential for the general theory of Banach and Hilbert bundles. However, these restrictions are necessary for our approach to constructing heat kernels in this setting.
2.

The intuitive idea is the following: a smooth Banach bundle is a smooth vector bundle where the fibers are allowed to be infinite-dimensional and come equipped with a complete norm. The smooth norm condition enforces that the Banach space fibers are stitched together in such a way that the fiber-wise norm varies smoothly. In the case of a Hilbert bundle, the smooth norm condition also enforces that the fiber-wise inner product $\langle-,-\rangle_{p}$ varies smoothly.
3.

In light of the previous remarks, the definition presented here is not minimal. We make the choice to include redundant information in our definition for clarity, with the understanding that some conditions are superfluous [75].We also make the choice to include the smooth norm condition, often called a smooth orthogonal/Hermitian metric, in the definition of the bundle itself.
4.

In the case of a finite dimensional model space $\mathcal{V}$ , this definition recovers the usual smooth vector bundle, with the additional data of a smooth orthogonal/Hermitian metric.
5.

Suppose $\pi:\mathcal{E}\to\mathcal{M}$ is a smooth Hilbert bundle modeled on a Hilbert space $\mathcal{H}$ . While the transition maps $\tau_{UV}$ must respect the topological structure of the Hilbert space $\mathcal{H}$ , it need not respect the inner product structure. When each transition map $\tau_{UV}$ is a unitary isomorphism, we say the bundle is a smooth unitary Hilbert bundle.

Definition 10 (Smooth sections of a Banach bundle).

Let $\pi:\mathcal{E}\to\mathcal{M}$ be a smooth Banach bundle (resp. Hilbert bundle) over a finite–dimensional manifold $\mathcal{M}$ with model Banach space (resp. Hilbert space) $B$ . A section of $\mathcal{E}$ is a map $S:\mathcal{M}\to\mathcal{E}$ such that $\pi\circ S=\mathrm{id}_{\mathcal{M}}$ . We denote the collection of all smooth sections by $\Gamma(\mathcal{E}):=C^{\infty}(\mathcal{M},\mathcal{E})$ . Note that this is a module over the commutative algebra $C^{\infty}(\mathcal{M})$ with point-wise addition and multiplication. If the section $S$ is only $k$ -times continuously differentiable, we write $S\in C^{k}(\mathcal{M},\mathcal{E})$ .

Definition 11 ( $L^{2}$ -Sections of a Banach bundle).

Suppose that the manifold $\mathcal{M}$ is endowed with a measure $\mu$ . We say that $S$ is an $L^{2}$ -section if $\|S\|_{2}:=\left(\int_{\mathcal{M}}||S(x)||^{2}_{\mathcal{E}_{x}}\,d\mu(x)\right)^{1/2}<\infty$ . We may similarly form a space of $L^{2}$ -sections, denoted $L^{2}(\mathcal{M},\mathcal{E};\mu)$ , or simply $L^{2}(\mathcal{M},\mathcal{E})$ when the measure is implied by context. When $\mathcal{E}$ is modeled on a separable Hilbert space $\mathcal{H}$ , the space of $L^{2}$ -sections $L^{2}(\mathcal{M},\mathcal{E};\mu)$ is a real separable Hilbert space with inner product $\langle S,S^{\prime}\rangle_{\mathcal{E}}:=\int_{\mathcal{M}}\langle S(x),S^{\prime}(x)\rangle_{\mathcal{E}_{x}}\,d\mu(x)$ .

Remark 3.

In the Riemannian setting, we have a natural candidate for $\mu$ via the (pseudo) volume form on $\mathcal{M}$ , or its normalized variant.

G.1.3 Connections

We now introduce connections on smooth Banach and Hilbert bundles. The smooth Banach manifold structure on the bundle $\mathcal{E}$ provides no way to directly compare vectors in different fibers $\mathcal{E}_{x}$ and $\mathcal{E}_{y}$ with respect to their Banach space structures. Instead, as in the finite-dimensional case, we use a connection to link the fibers though the geometry of the base manifold $\mathcal{M}$ .

Definition 12.

Let $\pi:\mathcal{E}\to\mathcal{M}$ be a smooth Banach bundle over a compact manifold $\mathcal{M}$ , and let $T^{*}\mathcal{M}$ denote the cotangent bundle on $\mathcal{M}$ . A connection on $\mathcal{E}$ is any of the following three equivalent structures:

1.

A connection is an $\mathbb{R}$ -linear map:

$\nabla:\Gamma(\mathcal{E})\rightarrow\Gamma(T^{*}\mathcal{M}\otimes\mathcal{E})$

such that the product rule:

$\nabla(fS)=Df\otimes S+f\nabla S$

holds for all smooth function $f:\mathcal{M}\rightarrow\mathbb{R}$ and smooth sections $s\in\Gamma(\mathcal{E})$ .
2.

A connection is a map $\nabla:\Gamma(T\mathcal{M})\times\Gamma(\mathcal{E})\to\Gamma(\mathcal{E})$ which is $C^{\infty}(\mathcal{M},\mathbb{R})$ -linear in its vector-field input, and satisfies the Leibniz rule:

$\nabla_{X}(fS)=X[f]S+f\nabla_{X}S$

where $\nabla_{X}S:=\nabla(X,S)$ and $X[f](x):=(D_{x}f)(X_{x})$ is the directional derivative of $f$ along the vector field $X$ .
3.
A connection $\nabla$ is the data of an $\mathbb{R}$ -linear map $\nabla_{x}S:T_{x}\mathcal{M}\rightarrow\mathcal{E}_{x}$ for each $x\in\mathcal{M}$ and $s\in\Gamma(\mathcal{E})$ that satisfies the following conditions:
1. (a)
  
  $\nabla_{(-)}S$ depends smoothly on $x$
2. (b)
  
  $\nabla_{x}(a_{1}S_{1}+a_{2}S_{2})=a_{1}\nabla_{x}S_{1}+a_{2}\nabla_{x}S_{2}$ for all $x\in M$ , $S_{1},S_{2}\in\Gamma(\mathcal{E})$ , and $a_{1},a_{2}\in\mathbb{R}$ ;
3. (c)
  
  for every smooth $f:\mathcal{M}\to\mathbb{R}$ and section $S\in\Gamma(\mathcal{E})$ , let $fS\in\Gamma(\mathcal{E})$ denote the section $(fS)(x):=f(x)s(x)$ . For each $x\in\mathcal{M}$ , the maps $\nabla_{x}(fS)$ and $\nabla_{x}s$ are related by
  
  $\nabla_{x}(fS)(v)=D_{x}f(v)S(x)+f(x)(\nabla_{x}S)(v)$
  
  for all $v\in T_{x}\mathcal{M}$ .

The following proposition provides a standard representation theorem for a smooth connection $\nabla$ .

Proposition 2.

Let $\nabla$ be a smooth connection on a trivial Hilbert bundle $\pi:\mathcal{M}\times\mathcal{H}\to\mathcal{M}$ . There is a map

A:\Gamma(\mathcal{M}\times\mathcal{H})\to\Gamma(T^{*}\mathcal{M}\otimes(\mathcal{M}\times\mathcal{H}))

such that for every section $S\in\Gamma(\mathcal{M}\times\mathcal{H})$ , we have:

\nabla S=dS+AS

where $d$ is the Fréchet derivative. Equivalently, for each $x\in\mathcal{M}$ , and $v\in T_{x}M$ , there is a bounded linear operator $A_{x,v}:\mathcal{H}\to\mathcal{H}$ , varying smoothly in $(x,v)$ , such that:

(\nabla_{x}S)(v)=(D_{x}S)(v)+(A_{x,v}S)(x)\,.

Moreover, the assignment $v\mapsto A_{x,v}$ is linear for each $x$ .

Remark 4.

While this proposition is stated for trivial bundles, it can be applied to any Hilbert bundle through the choice of a trivialization, or through Kuiper’s theorem.

G.1.4 Parallel Transport

Connections allow us to relate the geometries of the fibers over nearby points in $\mathcal{M}$ . For example, we may define parallel transport.

Given a smooth curve $\gamma:[0,1]\to\mathcal{M}$ , say that $S:[0,1]\to\mathcal{E}$ is a section over $\gamma$ if $\pi\circ S=\gamma$ .

Definition 13.

Let $\pi:\mathcal{E}\to\mathcal{M}$ be a smooth Banach bundle with model space $X$ , let $\nabla$ be a connection on $\mathcal{E}$ , and let $\gamma:[0,1]\to\mathcal{M}$ be a smooth path. A map $S:[0,1]\to\mathcal{E}$ is a section over $\gamma$ if $\pi\circ S=\gamma$ . A section over $\gamma$ is parallel if

\nabla_{\dot{\gamma}}S=0.

Proposition 3.

Let $\gamma$ be a smooth path in $\mathcal{M}$ . For every vector $v\in\mathcal{E}_{\gamma(0)}$ , there is a unique parallel section $S_{v}$ over $\gamma$ such that $S_{v}(0)=v$ . Moreover, the dependence on $v$ is smooth and linear.

Proof.

The existence of the parallel section $S_{v}$ can be restated as an initial value problem for a linear ordinary differential equation in a Banach space. Standard existence and uniqueness theorems apply. For details, see [62]. ∎

By the existence and uniqueness of parallel sections, we may define corresponding parallel transport maps. Given a path $\gamma:[0,1]\to\mathcal{M}$ , there is an induced parallel transport operator $P_{\gamma}^{\nabla}:\mathcal{E}_{\gamma(0)}\to\mathcal{E}_{\gamma(1)}$ defined by

P^{\nabla}_{\gamma}(v):=S_{v}(1).

It is straightforward to see that $P^{\nabla}_{\gamma}$ is a linear bijection, with inverse given by $(P^{\nabla}_{\gamma})^{-1}=P^{\nabla}_{\gamma^{\text{rev}}}$ , where $\gamma^{\text{rev}}$ is the path obtained by reversing $\gamma$ . By the closed graph theorem, it follows that $P^{\nabla}_{\gamma}$ is a bounded linear isomorphism.

Definition 14.

Let $\pi:\mathcal{E}\rightarrow\mathcal{M}$ be a Hilbert bundle equipped with a connection $\nabla$ . We say the connection $\nabla$ is compatible with the Hilbert bundle structure if it satisfies:

X[\langle S_{0}(x),S_{1}(x)\rangle_{x}]=\langle\nabla_{X}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{X}S_{1}(x)\rangle_{x}

for every smooth vector field $X\in\Gamma(T\mathcal{M})$ , and sections $S_{0},S_{1}\in\Gamma(\mathcal{E})$ .

Proposition 4.

Let $\pi:\mathcal{E}\to\mathcal{M}$ be a Hilbert bundle with connection $\nabla$ . The following are equivalent:

1.

$\nabla$ is compatible with the Hilbert bundle structure;
2.

Every parallel transport map $P^{\nabla}_{\gamma}$ is unitary;

Proof.

First suppose that $\nabla$ is compatible with the Hilbert bundle structure, and $\gamma$ a smooth path in $\mathcal{M}$ . Let $u,v\in\mathcal{E}_{\gamma(0)}$ . By compatibility, we may check that:

\displaystyle\frac{d}{dt}\big\langle S_{u}(t),S_{v}(t)\big\rangle_{\gamma(t)}

\displaystyle=\big\langle\nabla_{\dot{\gamma}}S_{u}(t),S_{v}(t)\big\rangle_{\gamma(t)}+\big\langle S_{u}(t),\nabla_{\dot{\gamma}}S_{v}(t)\big\rangle_{\gamma(t)}=0\,.

It immediately follows that $P^{\nabla}_{\gamma}$ is unitary.

Conversely, suppose that every parallel transport map $P^{\nabla}_{\gamma}$ is unitary. Let $X$ be a smooth vector field, and $S_{0},S_{1}$ smooth sections of $\mathcal{E}$ . Let $x\in\mathcal{M}$ , and let $\gamma$ be a smooth path such that $\gamma(0)=x$ . For $j\in\{0,1\}$ let $u_{j}$ be a parallel section over $\gamma$ such that $u_{j}(0):=S_{j}(x)$ . For $j\in\{0,1\}$ , let $w_{j}(t):=S_{j}(\gamma(t))-u_{j}(t)$ . Finally, let $w_{j}(t):=S_{j}(\gamma(t))-u_{j}(t)$ . Since $u_{j}$ is parallel over $\gamma$ , we have that $\nabla_{\dot{\gamma}}S_{j}(\gamma(t))=\nabla_{\dot{\gamma}}w_{j}(t).$ Moreover, $w_{j}(0)=0$ . We may use these facts to compute:

	$\displaystyle X[\big\langle S_{0}(x),S_{1}(x)\big\rangle_{x}]$	$\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\big\langle S_{0}(\gamma(t)),S_{1}(\gamma(t))\big\rangle_{\gamma(t)}$
		$\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\big\langle u_{0}(t)+w_{0}(t),u_{1}(t)+w_{1}(t)\big\rangle_{\gamma(t)}$
		$\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\langle u_{0}(t),u_{1}(t)\rangle_{\gamma(t)}+\langle\nabla_{\dot{\gamma}}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{\dot{\gamma}}S_{1}(x)\rangle_{x}$

Since parallel transport maps are unitary, it follows that the quantity $\big\langle u_{0}(t),u_{1}(t)\big\rangle_{\gamma(t)}$ is constant in $t$ . Hence

	$\displaystyle X[\big\langle S_{0}(x),S_{1}(x)\big\rangle_{x}]$	$\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\langle u_{0}(t),u_{1}(t)\rangle_{\gamma(t)}+\langle\nabla_{\dot{\gamma}}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{\dot{\gamma}}S_{1}(x)\rangle_{x}$
		$\displaystyle=\langle\nabla_{X}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{X}S_{1}(x)\rangle_{x}$

proving that $\nabla$ is a compatible connection. ∎

Remark 5.

We note that by the Proposition 4, as we assumed our parallel transport operators to be unitary in the main text, this may equivalently be understood as a metric-compatibility condition.

G.2 Connection Laplacian

Let $\pi:\mathcal{E}\to\mathcal{M}$ be a Hilbert bundle on a closed Riemannian manifold, equipped with a compatible connection $\nabla:\Gamma(\mathcal{M};\mathcal{E})\to\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E})$ . Moreover, $T^{*}\mathcal{M}\otimes\mathcal{E}$ inherits the structure of a Hilbert bundle, with fiber-wise inner products induced from the metric $g$ and the fiber-wise inner products of $\mathcal{E}$ . Since $\nabla$ is a linear differential operator, it has a formal adjoint $\nabla^{*}:\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E})\to\Gamma(\mathcal{M};\mathcal{E})$ , defined implicitly by the formula:

\int_{\mathcal{M}}\langle\nabla S_{0}(x),S_{1}(x)\rangle_{x}\,d\mu(x)\>=\>\int_{\mathcal{M}}\langle S_{0}(x),\nabla^{*}S_{1}(x)\rangle_{x}\,d\mu(x)

where $S_{0}\in\Gamma(\mathcal{M};\mathcal{E})$ , $S_{1}\in\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E})$ , and $\mu$ is the pseudo-volume form on $\mathcal{M}$ . Using this adjoint, we may define the connection Laplacian.

Definition 15.

The connection Laplacian is the linear operator:

\Delta_{\nabla}:=\nabla^{*}\nabla:\Gamma(\mathcal{E})\to\Gamma(\mathcal{E})\,.

Remark 6.

The connection $\nabla:\Gamma(\mathcal{E})\to\Gamma(T^{*}\mathcal{M}\otimes\mathcal{E})$ can be extended to a closed, densely-defined unbounded operator $\nabla_{L^{2}}:L^{2}(\mathcal{E})\to L^{2}(T^{*}\mathcal{M}\otimes\mathcal{E})$ . This extended operator has an adjoint $\nabla^{*}_{L^{2}}$ as a Hilbert space operator, from which we may define a composite $\Delta_{\nabla_{L^{2}}}:=\nabla_{L^{2}}^{*}\nabla_{L^{2}}$ . The formal adjoint $\nabla^{*}$ and connection Laplacian $\Delta_{\nabla}$ can be found by restricting the domains of $\nabla_{L^{2}}^{*}$ and $\Delta_{\nabla_{L^{2}}}$ to linear subspaces of smooth sections. From this perspective, it becomes clear that the connection Laplacian $\Delta_{\nabla}$ is well defined for all $C^{2}$ sections, as $C^{2}$ is contained in the Sobolev space $H^{2}$ .

The connection Laplacian also admits a characterization in terms of covariant derivatives. Let $\nabla^{2}_{X,Y}$ denote the second covariant derivative with respect to vector fields $X,Y$ .

Lemma 1.

(Connection Laplacian in Coordinates) Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle over a closed Riemannian manifold equipped with a compatible Fréchet connection. As operators,

\Delta_{\nabla}S=-\mathrm{tr}\nabla^{2}S\,.

Moreover, with respect to a local orthonormal frame $\{e_{i}\}_{i=1}^{m}$ that is synchronous at $p\in\mathcal{M}$ , we have equality:

\Delta_{\nabla}S(p)=-\sum_{i=1}^{m}\nabla_{e_{i}}\nabla_{e_{i}}S(p)\,.

Proof.

We adapt the proof of [79] to the Hilbert bundle setting, using the synchronous frame technique of [76]. By a partition of unity argument, it suffices to show that for every pair of smooth sections $S_{1},S_{2}$ supported inside the domain of a local orthonormal frame $\{e_{i}\}_{i=1}^{m}$ , we have an equality:

\int_{\mathcal{M}}\langle\Delta_{\nabla}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=-\int_{\mathcal{M}}\sum_{i=1}^{m}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)\>.

Using the formal adjoint of $\nabla$ , the integral on the left can be rewritten as

\int_{\mathcal{M}}\langle\Delta_{\nabla}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=\sum_{i=1}^{m}\int_{\mathcal{M}}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\,d\mu(x)\>.

To analyze the right hand side, simply note that $\nabla_{e_{i},e_{i}}^{2}S_{1}=\nabla_{e_{i}}\nabla_{e_{i}}S_{1}-\nabla_{\nabla_{e_{i}}e_{i}}S_{1}$ , and by compatibility, that:

e_{i}[\langle\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}]=\langle\nabla_{e_{i}}\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}+\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\>.

Rearranging and summing over $i$ yields:

	$\displaystyle\sum_{i}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}=$	$\displaystyle-\sum_{i}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}$
		$\displaystyle+\sum_{i}\left(e_{i}[\langle\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}]-\langle\nabla_{\nabla_{e_{i}}e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\right)\>.$

The second sum on the right-hand side may be identified as the divergence of the vector field $\langle\nabla_{v}S_{1}(x),S_{2}(x)\rangle_{x}$ , and hence integrates to zero by Stokes’ theorem. Therefore

-\int_{\mathcal{M}}\sum_{i}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=\sum_{i}\int_{\mathcal{M}}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\,d\mu(x)

as well. Therefore $\Delta_{\nabla}(S)=-\mathrm{tr}\nabla^{2}(S)$ . Under the additional hypothesis that $\{e_{i}\}_{i=1}^{m}$ is synchronous at $p$ , we have $\nabla_{e_{i}}e_{i}=0$ for each $1\leq i\leq m$ . At such a point, the trace reduces to $\mathrm{tr}\nabla^{2}S(p)=\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p)$ . ∎

G.3 Heat Flow on a Hilbert Bundle

In this section, we fix a closed finite-dimensional Riemannian manifold $(\mathcal{M},g)$ with canonical volume pseudo-form $\mu$ , a smooth Hilbert bundle $\pi:\mathcal{E}\to\mathcal{M}$ with fiber $\mathcal{E}_{x}\cong\mathcal{H}$ , and a connection $\nabla$ . Our goal in this section is three-fold:

1.

Demonstrate that the heat equation with respect to the connection Laplacian $\Delta_{\nabla}$ has a unique solution;
2.

Show that the heat-flow admits a Heat Kernel;
3.

Provide asymptotic estimates for the heat flow that relate to the geometry of the underlying manifold $\mathcal{M}$ .

Our approach is to adapt the methods of Berline, Getzler, and Vergne [15] for finite-rank bundles to the Hilbert bundle setting. A key subtlety that arises in this generalization is in the definition of tensor-products of bundles. While the algebraic and topological tensor products of finite-rank Hilbert spaces agree, they need not coincide for general Hilbert spaces. This complicates, for instance, the necessary tensor-hom adjunction. In order to keep track of the appropriate tensor, we adopt the following convention. Let $\pi_{0}:\mathcal{E}_{0}\to\mathcal{M}$ and $\pi_{1}:\mathcal{E}_{1}\to\mathcal{M}$ be smooth Hilbert bundles over a common manifold $\mathcal{M}$ . One may form the hom-bundle $\hom(\mathcal{E}_{0},\mathcal{E}_{1})\to\mathcal{M}$ , whose fiber $\hom(\mathcal{E}_{0},\mathcal{E}_{1})_{x}$ is the Banach space of bounded linear operators from $\mathcal{E}_{0}$ to $\mathcal{E}_{1}$ . This is a Banach bundle when $\hom(\mathcal{E}_{0},\mathcal{E}_{1})_{x}$ is topologized with the operator norm.

Definition 16.

Let $\Delta_{\nabla}$ be the connection Laplacian for a compatible connection on a smooth Hilbert bundle $\pi:\mathcal{E}\to\mathcal{M}$ over a compact orientable manifold $\mathcal{M}$ . A heat kernel for $\Delta_{\nabla}$ is a continuous section $K^{\nabla}(x,y,t)$ of the Banach bundle $\text{hom}\big(\text{pr}_{2}^{*}\mathcal{E},\text{pr}_{1}^{*}\mathcal{E}\big)$ over $\mathcal{M}\times\mathcal{M}\times(0,\infty)$ that satisfies the following conditions.

1.

$K^{\nabla}(x,y,t)$ is $C^{1}$ with respect to $t$ , and is $C^{2}$ with respect to $x$ .
2.

$K^{\nabla}(x,y,t)$ satisfies the heat equation $\partial_{t}K^{\nabla}(x,y,t)=-(\Delta_{\nabla})_{x}K^{\nabla}(x,y,t)$ , where $(\Delta_{\nabla})_{x}$ means applying the Laplacian to $x$ .
3.

$K^{\nabla}(x,y,t)$ satisfies the boundary condition $\lim_{t\to 0}K_{t}S=S$ for every smooth section $s\in\Gamma(\mathcal{E})$ , where $(K_{t}S)(x)=\int K^{\nabla}(x,y,t)S(y)\,d\mu(y).$

Lemma 2.

(Heat Kernel for Hilbert Bundles) Let $\Delta_{\nabla}$ be the connection Laplacian associated to a Hilbert bundle, $(\mathcal{M},\mathcal{E},\nabla)$ . Let $n:=\dim(\mathcal{M})/2$ , $\psi$ be a cutoff function, and

C_{n}(x,y):=\frac{\exp\left(\frac{-d_{\mathcal{M}}(x,y)^{2}}{4t}\right)}{(4\pi t)^{n/2}}.

The following hold:

1.

The Laplacian $\Delta_{\nabla}$ admits a unique heat kernel $K^{\nabla}(x,y,t)$ .

There exist smooth sections $\Phi_{i}\in\Gamma\left(\mathcal{M}\times\mathcal{M},\text{hom}\big(\text{pr}_{2}^{*}\mathcal{E},\text{pr}_{1}^{*}\mathcal{E}\big)\right)$ such that for every $N>n$ , the kernel

K^{N}(x,y,t):=C_{n}(x,y)\psi(d_{\mathcal{M}}(x,y)^{2})\sum_{i=0}^{N}t^{i}\Phi_{i}(x,y)j(x,y)^{-1/2}|dx|^{1/2}

is asymptotic to $K^{\nabla}(x,y,t)$ , in the sense that

\|\partial^{k}_{t}(K^{\nabla}(x,y,t)-K^{N}(x,y,t))\|_{\ell}=O(t^{N-\frac{n+\ell+2k}{2}}).

3.

The leading term $\Phi_{0}(x,y)$ is equal to the parallel transport $P_{y\to x}:\mathcal{E}_{y}\to\mathcal{E}_{x}$ with respect to the Fréchet connection associated to $\mathcal{H}$ along the unique length-minimizing geodesic joining $y$ and $x$ .

Proof.

(Sketch) We note that parametrix-based approach of Berline, Getzler, and Vergne [15] extends to our setting with minor but judicious modifications. As the original parametrix argument is quite lengthy, we simply make note of the necessary modifications to their argument. First, all integration must be understood as Bochner integration. Second, to avoid ambiguities surrounding the algebraic and geometric tensor bundle, the hom-bundle is used instead of tensor bundles.
Now note that the parametrix argument is fundamentally local in nature. Consider a smooth Hilbert bundle $\mathcal{E}\to\mathcal{M}$ , and note that we require an associated Fréchet connection $\nabla$ to be metric-compatible. Then, in a a local trivialization $\mathcal{E}\big\rvert_{U}\cong U\times\mathcal{H}$ , with $\mathcal{H}$ a separable Hilbert space, the connection has the form $\Delta=d+A$ with $A$ a smooth $\hom(\mathcal{H},\mathcal{H})$ valued 1-form by Proposition 2. The connection Laplacian is a second-order elliptic operator with scalar principal symbol and lower-order coefficients in the Banach algebra $\hom(\mathcal{H},\mathcal{H})$ . The parametrix argument then proceeds via solving transport equations along geodesics and then correcting the resulting approximated kernel by a Volterra series. These steps do not rely in any essential way on finite-dimensionality of the fiber, but only on the fact that the coefficient algebra admits the usual smooth calculus and operator-norm estimates. Thus, replacing matrix-valued coefficients by $\hom(\mathcal{H},\mathcal{H})$ -valued ones, one obtains in the same way a smooth kernel $K^{\nabla}(x,y,t)$ , and the usual energy argument gives uniqueness. ∎

Remark 7.

The details of the necessary parametrix argument may be found in 2.1 – 2.5 of [15]. Note that while they assume a finite-rank hypothesis through the entirety of chapter two, the hypothesis is actually unused until section 2.6 of their work, when the operator $K_{t}$ is required to be Hilbert-Schmidt.

G.4 Borel Functional Calculus

Given a linear map $T:\mathbb{R}^{n}\to\mathbb{R}^{n}$ and a suitably well-behaved function $g:\mathbb{R}\to\mathbb{C}$ , one may “apply $g$ ” to $T$ to get a new linear map $g(T):\mathbb{R}^{n}\to\mathbb{R}^{n}$ . In particular, whenever $g$ is analytic with a globally defined Maclaurin expansion $g(x)=\sum_{j}a_{j}x^{j}$ , one may define $g(T)=\sum_{j}a_{j}T^{j}$ , where $T^{j}$ is interpreted as the $j$ -fold composition $T\circ\cdots\circ T$ . When $T$ is a bounded linear endo-operator on a Hilbert space, one may similarly define $g(T)$ via series expansion. However, when $T$ is unbounded, more care must be taken to handle series convergence. This difficulty in the unbounded case is pertinent for the HilbNet architecture, where the convolution filter $g$ must be applied to the unbounded connection Laplacian $\Delta_{\nabla}$ . The Borel functional calculus provides an elegant solution. While traditionally formulated through the spectral theorem and projection-valued measures, for the purpose of the HilbNet architecture, the following version (Theorem VIII.5 of [82]) will be sufficient.

Theorem 3.

(Spectral Theorem - Functional Calculus Form) Let $A$ be a self-adjoint operator on a Hilbert space $\mathcal{H}$ . Then there is a unique map $\hat{\phi}$ from the bounded Borel measurable functions on $\mathbb{R}$ into the space of bounded linear operators on $\mathcal{H}$ , $\mathcal{L}(\mathcal{H})$ , so that

•

$\hat{\phi}$ is an algebraic *-homomorphism.
•

$\hat{\phi}$ is norm continuous, that is, $\|\hat{\phi}(h)\|_{\mathcal{L}(\mathcal{\mathcal{H}})}\leq\|h\|_{\infty}$ .
•

Let $g_{n}(x)$ be a sequence of bounded Borel functions with $g_{n}(x)\xrightarrow[n\rightarrow\infty]{}x$ for each $x$ and $\left|h_{n}(x)\right|\leq|x|$ for all $x$ and $n$ . Then, for any $\psi\in\mathrm{dom}(A)$ , $\lim_{n\rightarrow\infty}\hat{\phi}\left(h_{n}\right)\psi=A\psi$ .
•

If $g_{n}(x)\to h(x)$ pointwise and if the sequence $\|h_{n}\|_{\infty}$ is bounded, then $\hat{\phi}(h_{n})\to\hat{\phi}(h)$ strongly.

In addition:

•

If $A\psi=\lambda\psi$ , then $\hat{\phi}(h)\psi=h(\lambda)\psi$ .
•

If $g\geq 0$ , then $\hat{\phi}(h)\geq 0$ .

G.5 Cellular Sheaves and Sheaf Laplacians

Cellular sheaves on graphs are a data structure that generalizes weighted graphs. We take our exposition of cellular sheaves and their Laplacians primarily from [52]. See [52, 29, 53] for more details.

Definition 17 (Cellular Sheaf on a Graph).

Let $G=(V,E)$ be an undirected multi-graph without self-loops, and finitely many vertices and edges. Let $v\leq e$ denote that node $v$ is incident to the edge $e$ . A cellular sheaf, or equivalently network sheaf, $\mathcal{F}$ on $G$ consists of the following data.

•

A vector space $\mathcal{F}(\sigma)$ for each $\sigma\in V\amalg E$ , called the stalk over $\sigma$ .
•

A linear map $\mathcal{F}_{v\leq e}:\mathcal{F}(v)\to\mathcal{F}(e)$ for each incident pair $v\leq e$ , called the restriction map of $v$ into $e$ .

Remark 8.

At the level of category theory, a cellular sheaf is a functor $\mathcal{F}:G\to\mathbf{Vect}$ , where the graph $G=(V,E)$ is viewed as a posetal category with objects $\mathrm{Ob}(G)=V\amalg E$ , and a unique homomorphism from $v\to e$ whenever $v\leq e$ . In this light, we adopt the notation $\mathcal{F}:G\to\mathbf{Vect}$ for a cellular sheaf $\mathcal{F}$ on a graph $G$ .

Traditionally, to add geometric content to a cellular sheaf, one passes to weighted cellular sheaves: a cellular sheaf $\mathcal{F}:G\to\mathbf{Vect}$ where each stalk $\mathcal{F}_{\sigma}$ is a finite dimensional vector space endowed with an inner product $\langle-,-\rangle_{\sigma}$ . To accommodate infinite-dimensional Hilbert space stalks, we instead follow the approach of [44].

Definition 18.

(Hilbert Cellular Sheaf on a Graph) A Hilbert cellular sheaf $\mathcal{F}$ on a finite graph $G=(V,E)$ consists of the following data.

•

A Hilbert space $\mathcal{F}(v)$ for each $v\in V$ , referred to as the node stalk over $v$ .
•

A Hilbert space $\mathcal{F}(e)$ for each $e\in E$ , referred to as the edge stalk over $v$ .
•

For each edge $e\in E$ with bounding vertices $u,v$ , a pair of bounded linear restriction maps $\mathcal{F}_{u\leq e}:\mathcal{F}(u)\to\mathcal{F}(e)$ and $\mathcal{F}_{v\leq e}:\mathcal{F}(v)\to\mathcal{F}(e)$ .

Remark 9.

A bounded Hilbert sheaf $\mathcal{F}$ can again be viewed as a functor $\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}}$ , where $G$ is the graph $G$ viewed as an acyclic category, and $\mathbf{Hilb}_{\mathbb{R}}$ is the category of real Hilbert spaces and bounded globally-defined linear operators.

Remark 10.

In order to better differentiate between the usual finite-rank cellular sheaf on a graph and the potentially infinite-rank Hilbert cellular sheaves considered above, we use the terminology of network sheaves when we wish to emphasize the finite-rank consideration.

Definition 19.

Let $\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}}$ be a bounded Hilbert sheaf on a graph $G=(V,E)$ . The spaces of 0-cochains and 1-cochains are defined by:

	$\displaystyle C^{0}(G;\mathcal{F})$	$\displaystyle:=\bigoplus_{v\in V}\mathcal{F}(v)\,,$
	$\displaystyle C^{1}(G;\mathcal{F})$	$\displaystyle:=\bigoplus_{e\in E}\mathcal{F}(e)\,.$

where $\oplus$ denotes the direct sum of Hilbert spaces. For a 0-cochain $\mathbf{x}\in C^{0}(G;\mathcal{F})$ , we denote the component of $\mathbf{x}$ in the stalk over the node $v$ by $\mathbf{x}_{v}$ , with a similar notation for components of 1-cochains.

Definition 20.

Let $G=(V,E)$ be a graph. A signed incidence relation on $G$ is a pairing $[-:-]:V\times E\to\{-1,0,1\}$ which satisfies the following conditions:

1.

$[v:e]\neq 0$ if and only if $v\leq e$ .
2.

For each $e$ , $\sum_{v\leq e}[v:e]=0$ .

Remark 11.

The data of a signed incidence structure on $G=(V,E)$ is equivalent to the choice of a source $s(e)$ and target $t(e)$ for each edge $e$ . In particular, the total set of incidences can be put into two-to-one correspondence with edges, counting the two distinct “boundings” of each $e\in E$ .

Definition 21.

Let $G=(V,E)$ be a graph equipped with a signed incidence relation. Let $\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}}$ be a bounded Hilbert sheaf on $G$ . The coboundary operator $\delta:C^{0}(G;\mathcal{F})\to C^{1}(G;\mathcal{F})$ is the operator with image on the each edge stalk:

\left(\delta\mathbf{x}\right)_{e}:=\sum_{\begin{subarray}{c}v\leq e\end{subarray}}[v:e]\mathcal{F}_{v\leq e}(\mathbf{x}_{v})\,.

Remark 12.

The coboundary map $\delta$ depends on the choice of the signed incidence relation. However, given two signed incidence relations $[-:-]_{1},[-:-]_{2}$ , the corresponding coboundary operators $\delta_{1},\delta_{2}$ differ on the stalk $\mathcal{F}(e)$ by at most a sign difference $(\delta_{1}\mathbf{x})_{e}=\pm(\delta_{2}\mathbf{x})_{e}$ . In particular, $\ker(\delta)$ does not depend on the choice of $\epsilon$ .

Definition 22.

Let $\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}}$ be a bounded Hilbert network sheaf on a graph $G=(V,E,\epsilon)$ equipped with a signed incidence relation. Let $\delta^{*}:C^{1}(G;\mathcal{F})\to C^{0}(G;\mathcal{F})$ denote the linear adjoint of the corresponding coboundary operator with respect to the inner product structures on the spaces of cochains as product spaces. The Hilbert sheaf Laplacian is the operator $\Delta_{\mathcal{F}}:C^{0}(G;\mathcal{F})\to C^{0}(G;\mathcal{F})$ defined by the composition:

\Delta_{\mathcal{F}}=\delta^{*}\circ\delta\,.

Proposition 5.

The Hilbert sheaf Laplacian $\Delta_{\mathcal{F}}$ has the following properties.

1.

The Laplacian $\Delta_{\mathcal{F}}$ is a self-adjoint globally-defined bounded linear operator.
2.

When $C^{0}(G;\mathcal{F})\xrightarrow{\delta}C^{1}(G;\mathcal{F})$ is viewed as a Hilbert complex (in the sense of [23]) the kernel $\ker(\Delta_{\mathcal{F}})$ recovers the space of harmonic $0$ -cochains.
3.

The negative Laplacian $-\Delta_{\mathcal{F}}$ is the infinitesimal generator of a strongly continuous semigroup $e^{-t\Delta_{\mathcal{F}}}$ on $C^{0}(G;\mathcal{F})$ . For a choice of initial cochain $\mathbf{x}_{0}\in C^{0}(G;\mathcal{F})$ , the resulting flow $\mathbf{x}_{t}:=e^{-t\Delta_{\mathcal{F}}}\mathbf{x}_{0}$ is a solution to the sheaf heat equation $\frac{d}{dt}\mathbf{x}_{t}=-\Delta_{\mathcal{F}}\mathbf{x}_{t}$ . Moreover, the flow has limiting behavior $\lim_{t\to\infty}\mathbf{x}_{t}=\Pi_{\ker(\Delta_{\mathcal{F}})}\mathbf{x}_{0}$ , where $\Pi_{\ker(\Delta_{\mathcal{F}})}$ denotes the orthogonal projection onto $\ker(\Delta_{\mathcal{F}})$ .

Proof.

See [44]. ∎

We now recall our construction of Hilbert cellular sheaf from a spatially-discretized Hilbert bundle.

Definition (Hilbert Cellular Sheaf from a Hilbert Bundle).

•

The Hilbert space $\mathcal{F}_{n}^{t}(x_{i}):=\mathcal{E}_{x_{i}}$ for each $x_{i}\in\mathcal{X}_{n}$ , referred to as the node stalk over $x_{i}\in\mathcal{X}_{n}$ .
•

The Hilbert space $\mathcal{F}_{n}^{t}(e_{ij}):=\mathcal{E}_{m_{\gamma_{ij}}}$ for each $e_{ij}\in E$ , referred to as the edge stalk over $e_{i,j}\in E$ .

•

For each edge $e_{ij}\in E$ with bounding vertices $x_{i},x_{j}$ , a pair of bounded linear restriction maps

	$\displaystyle(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{i})\to\mathcal{F}_{n}^{t}(e_{ij}),$
	$\displaystyle(\mathcal{F}_{n}^{t})_{x_{j}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{j})\to\mathcal{F}_{n}^{t}(e_{ij}),$		(82)

Remark 13.

We make a few clarifying remarks on this construction.

•

Note that geodesics exist in this setting by the Hopf-Rinow theorem [36] by compactness of $\mathcal{M}$ , and we should further choose length-minimizing geodesics.
•

For simplicity, we use the geodesic distance to weight our restriction maps. However, we could also use the Euclidean heat kernel ala [14], and this would result in a reweighted sheaf Laplacian but would ultimately converge to the same connection Laplacian. In practical implementations, it is thus well-justified to work with the Euclidean heat kernel rather than geodesic distance based weights.
•

While this particular construction is chosen to emphasize the relationship to [14] and allow for the necessary analytical arguments, there exist alternative constructions that are geodesic choice-independent that generate the same sheaf Laplacian but emphasize functoriality.

G.6 Empirical Laplacians

Analogously to [14], we introduce two intermediary notions of Laplacian that interpolate between the Hilbert sheaf Laplacian and the Laplacian on a Hilbert bundle. For this section, fix a Hilbert bundle with compatible Fréchet connection $(\mathcal{M},\mathcal{E},\nabla)$ over a closed manifold $\mathcal{M}$ .

Definition 23.

Consider the unique normalized volume pseudo-form on $\mathcal{M}$ (or the usual volume form if $\mathcal{M}$ is orientable), denoted $d\mu$ . Thus, $d\mu$ equips $\mathcal{M}$ with a probability measure and we may refer to the resulting distribution as the uniform distribution on $\mathcal{M}$ .

Henceforth, let $\mathcal{X}_{n}=\{x_{1},\ldots,x_{n}\}$ denote the realization of an iid random sample drawn from the uniform distribution on $\mathcal{M}$ . We then recall the following construction.

Definition.

(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x):=\frac{1}{n}\sum_{j}e^{-d_{\mathcal{M}}(x,x_{j})^{2}/4t}\big(S(x)-P_{x_{j}\to x}S(x_{j})\big)

(83)

Remark 14.

We make the following remarks about the point-cloud Laplacian.

1.

The point-cloud Laplacian is the extension of the Hilbert sheaf Laplacian $\Delta_{\mathcal{F}^{t}_{n}}:C^{0}(G;\mathcal{F}_{n}^{t})\to C^{0}(G;\mathcal{F}^{t}_{n})$ to an operator acting on sections of the Hilbert bundle $\mathcal{E}\to\mathcal{M}$ , normalized by a factor of $1/n$ . In particular, when evaluated at a sample point $x_{i}\in\mathcal{X}$ , the point cloud Laplacian $\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x_{i})$ is exactly the normalized $x_{i}$ component of the Hilbert sheaf Laplacian $\Delta_{\mathcal{F}^{t}_{n}}$ evaluated at the cochain $(S(x_{1}),\ldots,S(x_{n}))^{T}\in C^{0}(G;\mathcal{F}^{t}_{n})$ .
2.

The point-cloud Laplacian is well defined for any section $S:\mathcal{M}\to\mathcal{E}$ , regardless of regularity.

Definition 24 (Functional Approximation Laplacian).

For a section $S\in C^{1}(\mathcal{E})$ , we define the functional approximation to the connection Laplacian

(\hat{\Delta}^{t}S)(x):=\int_{\mathcal{M}}\big(S(x)-P_{y\to x}S(y)\big)\exp\left(-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}\right)\,dy

where $\int(-)\,dy$ denotes Bochner integration with respect to the canonical normalized volume pseudo-form on $\mathcal{M}$ .

Remark 15.

We make the following remarks about the functional approximation Laplacian.

1.

The functional approximation Laplacian has no dependence on a sample of points from the underlying manifold. Instead, the functional approximation may be treated as the limiting operator where all points on the manifold have been sampled, and contribute uniformly via parallel transport.
2.

The geometric data of the connection $\nabla$ impacts the functional approximation Laplacian through the parallel transport maps $P_{y\to x}:\mathcal{E}_{y}\to\mathcal{E}_{x}$ , which links the fibers of $\mathcal{E}$ .

Viewing the sample $\mathcal{X}_{n}=\{x_{1},\ldots,x_{n}\}\subseteq\mathcal{M}$ as having been drawn iid from the uniform probability distribution on $\mathcal{M}$ , the functional approximation Laplacian can be identified pointwise on a section $S$ as the expected value of the point cloud Laplacian. That is, for any $S\in C^{1}(\mathcal{M},\mathcal{E})$ and $x\in\mathcal{M}$ , we have:

\frac{1}{\mathrm{vol}(\mathcal{M})}(\hat{\Delta}^{t}S)(x)=\mathbb{E}_{\mathcal{X}}\left[(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x)\right]

Appendix H Proofs of Results

H.1 Auxiliary Lemmas for Theorem 1

Lemma 3.

For $x=(x_{1},\ldots,x_{m})\in\mathbb{R}^{m}$ , $k\in\mathbb{N}$ , and $a,t>0$ , the following Gaussian identities hold:

$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int x_{i}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=0,$	(84)
$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int x_{i}x_{j}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=at\delta_{ij},$	(85)
$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int\\|x\\|^{2k+1}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=O(t^{k+\frac{1}{2}})\qquad\text{as $t\to 0$},$	(86)

where $\delta_{ij}$ is the Kronecker delta.

Proof.

Notice that $\frac{1}{(2\pi at)^{m/2}}\exp\left(-\frac{\|x\|^{2}}{2at}\right)$ is the density function of a multivariate normal random variable $X\sim\mathcal{N}(0,atI)$ . Equations (84) and (85) are simply the values of the coordinate-mean $\mathbb{E}[X_{i}]=0$ and covariance $\mathrm{Cov}(X_{i},X_{j})=at\delta_{ij}$ . Finally, we may write $X=\sqrt{at}Z$ , where $Z\sim\mathcal{N}(0,I)$ is a standard multivariate normal (zero-mean, uncorrelated, unit variance) in $m$ dimensions. We may write $\mathbb{E}[\|X\|^{2k+1}]=(at)^{k+\frac{1}{2}}\cdot\mathbb{E}[||Z||^{2k+1}]$ , which confirms (86). ∎

Remark 16.

If instead of integrating over the entire domain $\mathbb{R}^{m}$ , we integrate over a symmetric ball $B:=B(R,0)$ centered at zero, we recover the following augmented Gaussian identities:

$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}x_{i}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=0,$	(87)
$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}x_{i}x_{j}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=\bigg(at+O(e^{-R^{2}/2at})\bigg)\,\delta_{ij}\qquad\text{as $t\to 0$},$	(88)
$\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}\\|x\\|^{2k+1}\exp\left(-\frac{\\|x\\|^{2}}{2at}\right)\,dx$	$\displaystyle=O(t^{k+\frac{1}{2}})\qquad\text{as $t\to 0$}.$	(89)

The restriction to the ball $B$ leaves all odd-degree symmetries unchanged, and augments the even symmetries by a factor of the form $O(e^{-c/t})$ , capturing the exponential decay on probability mass far away from the origin.

Lemma 4.

(Banach Mean Value Theorem) Consider Banach spaces $\mathcal{B}_{1},\mathcal{B}_{2}$ and some open $\mathcal{U}\subseteq\mathcal{B}_{1},$ Then if $S:\mathcal{U}\rightarrow\mathcal{B}_{2}$ is Gateaux differentiable, then the mean value theorem holds in the sense that

\|S(x)-S(y)\|_{\mathcal{B}_{2}}\leq\|x-y\|_{\mathcal{B}_{1}}\sup_{0\leq t\leq 1}\|Df(tx+(1-t)y)\|

whenever the convex hull $[x,y]$ lies in $\mathcal{U}$ .

Proof.

The proof is a standard functional analysis argument but we recall it here for completeness. Let $\mathbb{L}$ be the one-dimensional subspace spanned by some fixed nonzero $u\in\mathcal{U}$ . Consider $\varphi(cu)=c\|u\|$ as a continuous linear functional on $\mathbb{L}$ with norm $1$ . By the strong form of the Hahn-Banach, as stated in [3], for instance, we may extend this functional to the whole domain. Then note that $\varphi(u)=\|u\|$ , so the result follows by considering $u=S(x)-S(y)$ . ∎

Remark 17.

We recall that Fréchet differentiability in particular implies Gateaux differentiable, which will be sufficient for our purposes.

Lemma 5.

(Banach Weak Law of Large Numbers) Let $\{X_{j}\}_{j\in\mathbb{N}}$ denote an independent identically distributed collection of random variables $X_{j}\in L^{1}(\Omega,B)$ , where $\Omega:=(\Omega,\Sigma,\mathbb{P})$ is a probability space and $B$ is a Banach space. Let $S_{n}:=\sum_{j=1}^{n}X_{j}$ denote the partial sum of the first $n$ random variables. As $n\to\infty$ , the normalized sequence $\frac{1}{n}S_{n}$ converges in probability to the mean $\mu:=\mathbb{E}[X_{j}]$ . That is for all $\epsilon>0$ ,

\lim_{n\to\infty}\mathbb{P}\left[\left|\frac{1}{n}S_{n}-\mu\right|>\epsilon\right]=0.

Proof.

See Pinelis [80] or Ledoux and Talagrand [64]. ∎

H.2 Key Lemmas for Theorem 1

Lemma 6.

(Taylor Series for Hilbert Signals) Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle equipped with a Fréchet connection. For a given signal $S\in C^{n+1}(\mathcal{M},\mathcal{E})$ , the space of $(n+1)$ -times continuously Fréchet-differentiable sections, and $p\in\mathcal{M}$ , consider any $q\in\mathcal{M}$ in a geodesic ball of $p$ and fix a length-minimizing curve $\gamma(t)$ from $p$ to $q$ . Let $S^{*}(t):=P_{\gamma(t)\to p}S(\gamma(t))$ denote the parallel transport of $S$ from $\gamma(t)$ back to $p$ along $\gamma$ . As $t\to 0$ , we have that

S^{*}(t)=\left[\sum_{j=0}^{n}\frac{t^{j}}{j!}\big(\nabla_{\dot{\gamma}}^{(j)}S\big)(p)\right]+O(t^{n+1})

Proof.

We first establish for a section $V(t)$ of $\mathcal{E}$ along $\gamma$ , that:

\frac{d}{dt}\left(P_{\gamma(t)\to p}V(t)\right)=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}V\right)(\gamma(t))

We compute directly from the definition:

$\displaystyle\frac{d}{dt}\left(P_{\gamma(t)\to p}V(t)\right)$	$\displaystyle=\lim_{h\to 0}\frac{P_{\gamma(t+h)\to p}V(\gamma(t+h))-P_{\gamma(t)\to p}V(\gamma(t))}{h}$	(90)
	$\displaystyle=\lim_{h\to 0}P_{\gamma(t)\to p}\left(\frac{P_{\gamma(t+h)\to\gamma(t)}V(\gamma(t+h))-V(\gamma(t))}{h}\right)$	(91)
	$\displaystyle=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}V\right)(\gamma(t))$	(92)

where in the second line we used the composition law for parallel transport, $P^{-1}_{\gamma(t)\to p}\circ P_{\gamma(t+h)\to p}=P_{\gamma(t+h)\to\gamma(t)}$ . which follows from uniqueness of the parallel transport ODE, together with the fact that $P_{\gamma(t)\to p}$ is a bounded linear operator and hence may be factored outside the limit.

Recall that $S^{*}(t):=P_{\gamma(t)\to p}S(\gamma(t))$ is a curve in the fixed Hilbert space $\mathcal{E}_{p}$ . Iteratively applying the previous derivative computation to $V(t)=\big(\nabla_{\dot{\gamma}}^{(n)}S\big)\big(\gamma(t)\big)$ , where $(-)^{(n)}$ denotes $n$ -fold composition, yields:

\frac{d^{n}}{dt^{n}}S^{*}(t)=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}^{(n)}S\right)(\gamma(t))

Evaluating at $t=0$ , where $P_{p\to p}=\mathrm{id}$ :

\frac{d^{n}}{dt^{n}}S^{*}\bigg|_{t=0}=\left(\nabla_{\dot{\gamma}}^{(n)}S\right)_{p}\,.

Finally, applying Taylor’s theorem for Banach spaces [25] yields the desired asymptotic statement. ∎

Lemma 7.

Let $\hat{\Delta}_{\mathcal{F}^{t}_{n}}$ and $\hat{\Delta}^{t}$ denote the point-cloud and functional Laplacian operators with bandwidth $t$ . We have the concentration inequality:

\mathbb{P}\left[\frac{1}{t(4\pi t)^{m/2}}\left|\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)-\hat{\Delta}^{t}S(x)\right|>\epsilon\right]\leq 2\exp\left(-\frac{t^{2}(4\pi t)^{m}\epsilon^{2}n}{2K^{2}}\right)

for some $K>0$ which depends only on the choice of section $s$ . Consequently, we have the following limit in probability as $n\to\infty$ :

\frac{1}{t(4\pi t)^{m/2}}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)\xrightarrow{\mathbb{P}}\hat{\Delta}^{t}S(x)\,.

Proof.

The point cloud Laplacian $\Delta_{\mathcal{F}^{t}_{n}}S(x)$ may be viewed as the sample average of $n$ iid Hilbert-space valued random variables:

X_{i}:=\exp\left(-\frac{d_{\mathcal{M}}(x,x_{i})^{2}}{4t}\right)S(x)-\exp\left(-\frac{d_{\mathcal{M}}(x,x_{i})^{2}}{4t}\right)P_{x_{i}\to x}S(x_{i})

Moreover, the functional approximation $\hat{\Delta}^{t}$ may be viewed as the expectation $\mathbb{E}\Delta_{\mathcal{F}^{t}_{n}}$ with respect to the uniform probability measure on $\mathcal{M}$ . The bundle $\mathcal{E}\to\mathcal{M}$ has separable fibers, so the results of [80] apply. We may recover a Hoeffding inequality:

\mathbb{P}\left[\frac{1}{\delta}\left|\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)-\mathbb{E}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)\right|>\epsilon\right]\leq 2\exp\left(-\frac{(\delta\epsilon)^{2}n}{2K^{2}}\right)

where $K$ is the maximum norm of the section $S$ over the compact manifold $\mathcal{M}$ . Setting $\delta=t(4\pi t)^{m/2}$ and identifying $\mathbb{E}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)=\hat{\Delta}^{t}S(x)$ yields the desired concentration inequality. Convergence in probability follows immediately. ∎

Lemma 8.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle on a closed Riemannian manifold equipped with a Fréchet connection. Let $B\subseteq\mathcal{M}$ be open, and $p\in B$ . Fix a section $S\in C^{3}(\mathcal{M},\mathcal{E})$ . For each $x\in\mathcal{M}$ , let $F(x):=P_{x\to p}S(x)$ denote the parallel transport of $S(x)$ along the designated geodesic connecting $x$ to $p$ . For any real $a\in\mathbb{R}^{>0}$ , the following asymptotic bound holds as $t\to 0$ :

\left|\int_{\mathcal{M}}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)-\int_{B}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)\right|=o(t^{a}).

Proof.

We note this is a modified version of Lemma 4.1 of [14]. Let $d:=\inf_{x\not\in B}\|p-x\|$ , $K:=\sup_{x\in\mathcal{M}}\|S(x)\|$ , and $M:=\mu(\mathcal{M}\setminus B)$ , where $\mu$ is the canonical measure with respect to the volume pseudo-form. Since $p$ is compact and $\mathcal{M}\setminus B$ is closed, the infimum distance $d>0$ . Recalling that $P_{x\to p}:\mathcal{E}_{x}\to\mathcal{E}_{p}$ is unitary, and hence $\|F(x)\|=\|S(x)\|$ for all $x\in\mathcal{M}$ , we may bound:

	$\displaystyle\left\|\int_{\mathcal{M}}e^{-\frac{\\|y-p\\|}{4t}}F(y)\,d\mu(y)-\int_{B}e^{-\frac{\\|y-p\\|}{4t}}F(y)\,d\mu(y)\right\|$	$\displaystyle\leq\int_{\mathcal{M}\setminus B}\left\\|e^{-\frac{\\|y-p\\|}{4t}}F(y)\right\\|\,d\mu(y)$
		$\displaystyle\leq\int_{\mathcal{M}\setminus B}e^{-\frac{\\|y-p\\|}{4t}}\left\\|S(y)\right\\|\,d\mu(y)$
		$\displaystyle\leq MK\exp\left(-\frac{d}{4t}\right)$
		$\displaystyle=o(t^{a}).$

∎

Lemma 9.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian $\Delta_{\nabla}$ , and functional approximation $\hat{\Delta}^{t}$ with bandwidth $t$ . Fix a section $S\in C^{3}(\mathcal{M},\mathcal{E})$ . For any $x\in\mathcal{M}$ as the bandwidth $t\to 0$ , we have pointwise convergence:

\lim_{t\rightarrow 0}\frac{1}{t\left(4\pi t\right)^{\frac{m}{2}}}\hat{\Delta}^{t}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\,.

Proof.

Let $\gamma_{t}:=\frac{1}{t}\frac{1}{(4\pi t)^{m/2}}$ , and consider the scaled functional approximation $\gamma_{t}\hat{\Delta}^{t}$ which acts on a section $S$ at point $p$ by:

\left(\gamma_{t}\hat{\Delta}^{t}S\right)(p)=\gamma_{t}\int_{\mathcal{M}}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t}}(S(x)-P_{x\to p}S(p))\,d\mu(x)\,.

Let $B\subseteq\mathcal{M}$ denote a sufficiently small ball containing $p$ . By Lemma 8,

\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}S\right)(p)\right]=\lim_{t\to 0}\gamma_{t}\int_{B}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t}}(S(p)-P_{p\to x}S(x))\,d\mu(x).

Parameterize $B$ via geodesic coordinates such that $p=0$ . Let $F(x):=P_{x\to p}S(x)$ . Let $\tilde{S}:\mathbb{R}^{k}\to\mathcal{H}$ and $\tilde{B}\subseteq\mathbb{R}^{k}$ denote the section $S$ and ball $B$ in coordinates. In these coordinates, we may write:

\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right]=\lim_{t\to 0}\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}e^{-\frac{d_{\mathcal{M}}(\exp_{p}(x),0)^{2}}{4t}}(\tilde{F}(0)-\tilde{F}(x))\sqrt{\det(g_{ij})}\,dx\,.

In geodesic coordinates, since the closed manifold $\mathcal{M}$ has bounded Ricci curvature, the metric tensor has an asymptotic expansion given by (as in e.g. [36])

\det(g_{ij})=1+O(\|x\|^{2})\,.

This approximation and the identification $d_{\mathcal{M}}(\exp_{p}(x),p)=\|x\|$ in coordinates allows us to express:

\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right]=\lim_{t\to 0}\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}e^{-\frac{\|x\|^{2}}{4t}}(\tilde{F}(0)-\tilde{F}(x))\big(1+O(\|x\|^{2})\big)\,dx\,.

Let $\mathbb{L}_{t}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\|x\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\big(1+O(\|x\|^{2})\big)\,dx$ . This expression splits as $\mathbb{L}_{t}=A_{t}+B_{t}$ , where:

	$\displaystyle A_{t}$	$\displaystyle:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\,dx\,,$
	$\displaystyle B_{t}$	$\displaystyle:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\left[O(\\|x\\|^{2})\right]\,dx\,.$

We first analyze the limiting behavior of $A_{t}$ as $t\to 0$ . Within our geodesic coordinates centered at $p$ , we may further work with a local synchronous frame of $\mathcal{E}$ along these coordinates $x=(x_{1},\dots,x_{k})$ such that it is parallel along all radial geodesics. Consequently, within this frame, ordinary derivatives of $\tilde{F}$ coincide with covariant derivatives of $S$ at the basepoint:

	$\displaystyle\partial_{i}\tilde{F}(p)$	$\displaystyle=\nabla_{e_{i}}S(p)$
	$\displaystyle\partial_{i}\partial_{j}\tilde{F}(p)$	$\displaystyle=\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\,,$

where $\mathsf{curv}_{ij}:=-\frac{1}{2}R^{\mathcal{E}}(e_{i},e_{j})S(p)$ is half the bundle curvature of $\mathcal{E}$ arising from the connection $\nabla$ . Hence by Lemma 6, the Taylor expansion of $\tilde{F}$ at $p$ is

\tilde{F}(x)=\tilde{F}(p)+\sum_{i}x_{i}\nabla_{e_{i}}S(p)+\frac{1}{2}\sum_{i,j}x_{i}x_{j}\big(\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\big)+O(\|x\|^{3}),

with $\mathsf{curv}_{ij}=-\mathsf{curv}_{ji}$ .

Using the augmented Gaussian identities (87), (88), and (89), we may compute:

	$\displaystyle I_{t}$	$\displaystyle:=\frac{1}{(4\pi t)^{m/2}}\int_{\tilde{B}}(\tilde{F}(p)-\tilde{F}(x))\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)\,dx$
		$\displaystyle=-\frac{1}{2}\sum_{i,j}\left[\big(\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\big)\left(2t+O\left(e^{-c/t}\right)\right)\delta_{ij}\right]+O\left(t^{3/2}\right)$
		$\displaystyle=-t\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p)-t\sum_{i}\mathsf{curv}_{ii}+O\left(t^{3/2}\right)$

where $\delta_{ij}$ is the Kronecker delta. Since $\mathsf{curv}_{ij}$ is antisymmetric, $\mathsf{curv}_{ii}=0$ , hence

I_{t}=-t\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p)+O(t^{3/2}).

Inserting into the definition of $A_{t}$ yields

\lim_{t\to 0}A_{t}=-\frac{1}{\operatorname{vol}(\mathcal{M})}\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p).

Thus we recover the connection Laplacian by Lemma 1 .

To analyze the quantity $B_{t}$ , we first observe that inside of $\tilde{B}$ , the parallel transport map $P_{\exp_{p}(x)\to p}$ varies smoothly in $x$ , with bounded derivatives. Since the section $S$ is $C^{3}$ , the mean value theorem (Lemma 4) ensures there is a $K>0$ such that $\|F(x)-F(p)\|\leq K\|x-p\|$ for all $x\in\tilde{B}$ . Utilizing this Lipschitz bound and the augmented Gaussian identities, we may compute:

	$\displaystyle\\|B_{t}\\|$	$\displaystyle\leq\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)\\|\tilde{F}(0)-\tilde{F}(x)\\|\left[O(\\|x\\|^{2})\right]\,dx$
		$\displaystyle\leq\frac{K/\mathrm{vol}(\mathcal{M})}{t}\frac{1}{(4\pi t)^{m/2}}\int_{\tilde{B}}\,\\|x\\|^{3}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)$
		$\displaystyle=\frac{1}{t}O\left(t^{3/2}\right)$
		$\displaystyle=O\left(\sqrt{t}\right)\,.$

Hence $B_{t}\to 0$ as $t\to 0$ . Combining with the analysis of $A_{t}$ yields:

	$\displaystyle\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right]$	$\displaystyle=\lim_{t\to 0}[A_{t}+B_{t}]$
		$\displaystyle=\Delta_{\nabla}S(p)$

∎

Lemma 10 (Variance asymptotics).

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian $\Delta_{\nabla}$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Consider a random sample of $n$ -points with respect to the normalized volume pseudo-form, $\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}$ . For each bandwidth $t$ , let $\gamma:=(t(4\pi t)^{m/2})^{-1}$ , where $m:=\dim(\mathcal{M})$ . Define an error term:

\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\,.

There is a constant $C_{\mathrm{var}}>0$ , which depends on the section $S$ , such that the following asymptotic estimate holds as $t\to 0^{+}$ :

\mathbb{E}_{\mathcal{X}_{n}}\left[\left\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]\leq\frac{C_{\mathrm{var}}}{nt^{2+\frac{m}{2}}}\,.

Proof.

Let $Y_{j}$ be the random section given by

Y_{j}(x):=\gamma(S(x)-P_{x_{j}\to x}S(x_{j}))\exp\left(-\frac{d_{\mathcal{M}}(x,x_{j})^{2}}{4t}\right).

Note that $Y_{j}$ is stochastic only through the sample point $x_{j}$ , and that $Y_{i},Y_{j}$ are independent when $i\neq j$ . It is straightforward to verify by Funbini’s theorem that

	$\displaystyle\mathbb{E}_{\mathcal{X}_{n}}\left[\left\\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\\|^{2}_{L^{2}}\right]$	$\displaystyle=\frac{1}{n}\left(\mathbb{E}_{x_{1}}\left[\\|Y_{1}\\|^{2}_{L^{2}}\right]-\\|\mathbb{E}_{x_{1}}Y_{1}\\|^{2}_{L^{2}}\right)$
		$\displaystyle\leq\frac{1}{n}\mathbb{E}_{x_{1}}\left[\\|Y_{1}\\|^{2}_{L^{2}}\right]\,.$

Set $K:=\max_{x\in\mathcal{M}}\|S(x)\|_{\mathcal{E}_{x}}$ . By Fubini, we may exchange the order of integration and find

\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right]=\int_{\mathcal{M}}\mathbb{E}_{x_{1}}\left[\left\|Y_{j}(x)\right\|_{\mathcal{E}_{x}}^{2}\right]dx

We may compute:

	$\displaystyle\mathbb{E}_{x_{1}}\left[\\|Y_{1}(x)\\|^{2}_{\mathcal{E}_{x}}\right]$	$\displaystyle=\frac{1}{\mathrm{vol}(\mathcal{M})}\int_{\mathcal{M}}\left\\|\gamma(S(x)-P_{x_{1}\to x}S(x_{1}))\exp\left(-\frac{d_{\mathcal{M}}(x,x_{1})^{2}}{4t}\right)\right\\|_{\mathcal{E}_{x}}^{2}dx_{1}$
		$\displaystyle\leq\frac{4\gamma^{2}K^{2}}{\mathrm{vol}(\mathcal{M})}\int_{\mathcal{M}}\exp\left(-\frac{d_{\mathcal{M}}(x,x_{1})^{2}}{2t}\right)dx_{1}\,.$

By standard Gaussian identities, the remaining Gaussian integral is $O(t^{m/2})$ as $t\to 0^{+}$ , with constant independent of $x$ . Recalling the definition of $\gamma$ in terms of $t$ , we recover that

\mathbb{E}_{x_{1}}\left[\|Y_{1}(x)\|^{2}_{\mathcal{E}_{x}}\right]=O(t^{-(2+m)})\cdot O(t^{m/2})=O(t^{-(2+m/2)}).

Since the constants are all independent of $x$ , we may integrate over $\mathcal{M}$ and find that

\displaystyle\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right]

\displaystyle=O(t^{-(2+m/2)})

as well. The result immediately follows. ∎

Lemma 11 (Bias asymptotics).

\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}_{n}}^{t}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S

There is a constant $C_{\mathrm{bias}}>0$ , which depends on the section $S$ , such that the following asymptotic estimate holds as $t\to 0^{+}$ :

\left[\left\|\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]\leq\big(C_{\mathrm{bias}}t\big)^{2}\,.

Proof.

This follows from essentially repeating the analysis in the proof of Lemma 9 using the fourth order Taylor expansion $F(y)=P_{y\to x}S(y)$ in terms of the covariant derivative. In particular, after accounting for the fourth-order Taylor remainder, we find that:

\mathbb{E}_{\mathcal{X}_{n}}[\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}(x)]=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)+O(t)\,.

The result follows. ∎

H.3 Proof of Theorem 1

H.3.1 Proof of Theorem 1 A

Theorem.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian $\Delta_{\nabla}$ . Fix a section $S\in C^{3}(\mathcal{M},\mathcal{E})$ . Consider a random sample of $n$ -points with respect to the normalized volume form, $\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}$ . Let $\mathcal{F}^{t}_{\mathcal{X}_{n}}$ be the associated Hilbert cellular sheaf with bandwidth $t$ and associated Point cloud Laplacian $\hat{\Delta}_{\mathcal{F}^{t}_{n}}$ . Then, we have that in probability, for any $x\in\mathcal{M}$ ,

\lim_{n\rightarrow\infty}\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)

with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ .

Proof.

Let $\gamma_{n}:=\frac{1}{t_{n}}\frac{1}{(4\pi t_{n})^{m/2}}$ , and consider the scaled functional approximation $\gamma_{n}\hat{\Delta}^{t_{n}}$ which acts on a section $S$ at point $p$ by:

\left(\gamma_{n}\hat{\Delta}^{t_{n}}S\right)(p)=\gamma_{n}\int_{\mathcal{M}}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t_{n}}}(P_{x\to p}S(x)-S(p))\,d\mu(x)\,.

We may bound:

	$\displaystyle\mathbb{P}\left[\left\\|\frac{1}{\gamma_{n}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\\|>2\epsilon\right]\leq$	$\displaystyle\quad\>\mathbb{P}\left[\frac{1}{\gamma_{n}}\left\\|\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\hat{\Delta}^{t_{n}}S(x)\right\\|>\epsilon\right]$
		$\displaystyle+\mathbb{P}\left[\left\\|\frac{1}{\gamma_{n}}\hat{\Delta}^{t_{n}}S(x)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\\|>\epsilon\right]$

as $n\to\infty$ we have $t_{n}\to 0$ . Hence the second quantity on the right hand side goes to zero by Lemma 9. On the other hand, the first quantity on the right hand side can be bound by the concentration inequality of Lemma 7, yielding:

\mathbb{P}\left[\frac{1}{\gamma_{n}}\left\|\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\hat{\Delta}^{t_{n}}S(x)\right\|>\epsilon\right]\leq 2\exp\left(\frac{t_{n}^{2+m}n(4\pi t)^{m}\epsilon^{2}}{2K^{2}}\right)\,,

where $K$ is a constant depending on the section $S$ . Since $nt_{n}^{2+m}\to\infty$ as $n\to\infty$ , the concentration upper bound goes to zero as $n\to\infty$ as well. This completes the proof of the of the main theorem. ∎

H.3.2 Proof of theorem 1 B

Theorem.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian $\Delta_{\nabla}$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Consider a random sample of $n$ -points with respect to the normalized volume form, $\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}$ . Let $\mathcal{F}^{t}_{\mathcal{X}_{n}}$ be the associated Hilbert cellular sheaf with bandwidth $t$ and associated Point cloud Laplacian $\hat{\Delta}_{\mathcal{F}^{t}_{n}}$ . Then, we have the following convergence in expectation:

\lim_{n\rightarrow\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|_{L^{2}}^{2}\right]=0

with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ .

Proof.

Let $\gamma_{n}:=(t_{n}(4\pi t_{n})^{m/2})^{-1}$ . Define an error term:

\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\,.

We may decompose the error into a bias and variance term as:

\mathbb{E}_{\mathcal{X}_{n}}\left[\|\mathcal{R}_{n}\|_{L^{2}}^{2}\right]=\mathbb{E}_{\mathcal{X}_{n}}\left[\left\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]+\left\|\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\,.

By the asymptotic results of Lemmas 10 and 11, there are positive constants $C_{\mathrm{var}}$ and $C_{\mathrm{bias}}$ such that

\mathbb{E}_{\mathcal{X}_{n}}\left[\|\mathcal{R}_{n}\|_{L^{2}}^{2}\right]\leq\frac{C_{\mathrm{var}}}{nt_{n}^{2+\frac{m}{2}}}+C_{\mathrm{bias}}^{2}t_{n}^{2}

Since $t_{n}=n^{-1/(m+2+\alpha)}$ , we have $t_{n}^{2}\to 0$ and $nt_{n}^{2+\frac{m}{2}}\to\infty$ as $n\to\infty$ . Hence both terms vanish in limit, proving the desired convergence. ∎

H.4 Key Lemmas for Theorem 2

Definition 25.

Let $\mathcal{E}$ be a smooth Hilbert bundle over a manifold $\mathcal{M}$ . A finite rank approximating sequence for $\mathcal{E}$ is a sequence of smooth sub-bundles $\{\mathcal{E}_{d}\}_{d\geq 1}$ with the following properties:

1.

For each $d$ , $\mathcal{E}_{d}$ has finite rank;
2.

For each $d$ , the bundle $\mathcal{E}_{d}$ is a sub-bundle of $\mathcal{E}_{d+1}$ ;
3.

For each $x\in\mathcal{M}$ , we have that $\mathrm{cl}\left(\mathrm{span}\left(\bigcup_{d}(\mathcal{E}_{d})_{x}\right)\right)=\mathcal{E}_{x}$ .

Lemma 12.

Let $\mathcal{E}$ be a smooth Hilbert bundle with infinite-dimensional fibers over a compact manifold $\mathcal{M}$ . A finite rank approximating sequence $\{\mathcal{E}_{d}\}_{d}$ exists.

Proof.

By Kuiper’s Theorem [61], the unitary group of the typical fiber $\mathcal{H}$ is contractible, implying that there exists an isomorphism of $\mathcal{E}$ with bundle $\mathcal{M}\times\mathcal{H}$ at the level of purely topological bundles. Now note that every Hilbert bundle $\mathcal{M}\times\mathcal{H}$ admits a finite rank approximating sequence, by considering a Hilbert space basis $\{e_{1},e_{2},\ldots\}$ for $\mathcal{H}$ , and defining $\mathcal{H}_{n}:=\mathrm{span}(e_{1},\ldots,e_{n})$ . The sequence $\{\mathcal{M}\times\mathcal{H}_{d}\}_{d}$ can then be seen to be a finite rank approximating sequence. Furthermore, because the base space is a finite-dimensional manifold, this topological trivialization can be upgraded to a smooth global trivialization [75]. Let $\Phi:\mathcal{E}\to\mathcal{M}\times\mathcal{H}$ be such a smooth isomorphism. Thus, the finite rank approximating sequence $\{\mathcal{M}\times\mathcal{H}_{d}\}_{d}$ pulls back to a finite rank approximating sequence on $\mathcal{E}$ by $\mathcal{E}_{d}:=\Phi^{-1}(\mathcal{M}\times\mathcal{H}_{d})$ , as desired. ∎

Lemma 13.

Let $(\mathcal{E},\mathcal{M},\nabla)$ be a smooth infinite-dimensional Hilbert bundle over a compact manifold $\mathcal{M}$ equipped with a compatible connection $\nabla$ . Let $\{\mathcal{E}_{d}\}_{d\geq 1}$ be a finite rank approximating sequence for $\mathcal{E}$ . The data of $\nabla$ induces a compatible connection $\nabla_{d}:=\Pi_{d}\nabla$ on $\mathcal{E}_{d}$ , where $\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}$ denotes the fiber-wise orthogonal projection onto $\mathcal{E}_{d}$ .

Proof.

This follows immediately from the fact that $\nabla$ is compatible and orthogonal projections are self-adjoint. ∎

Remark 18.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be an infinite-dimensional Hilbert bundle equipped with a compatible connection. By the previous lemmas, we may always find a finite rank approximating sequence $(\mathcal{E}_{d},\mathcal{M},\nabla_{d})$ , each with compatible connection. These compatible connections induce connection Laplacians $\Delta_{\nabla_{d}}$ on each sub-bundle. Moreover, Theorem 1 B applies to each Laplacian $\Delta_{\nabla_{d}}$ .

We restate and prove Proposition 1.

Proposition.

\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}

(93)

Proof.

One we fix a basis $\mathcal{B}$ , the conclusion follows from the proof of Lemma 12 and an application of Lemma 13. ∎

Definition 26.

Let $(\mathcal{M},\mathcal{E},\nabla)$ be a smooth Hilbert bundle over a closed manifold $\mathcal{M}$ of dimension $m$ equipped with a compatible connection $\nabla$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Let $\{\mathcal{E}_{d}\}_{d}$ be a finite rank approximating sequence for $\mathcal{E}$ with induced connections $\nabla_{d}$ , connection Laplacians $\Delta_{\nabla_{d}}$ , and bandwidth $t$ point cloud Laplacians $\hat{\Delta}_{n,d}^{t}$ associated to an iid sampling $\mathcal{X}=\{x_{1},x_{2},\ldots\}$ . Let $\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}$ denote the fiber-wise orthogonal projection map onto $\mathcal{E}_{d}$ . The discretization error $\mathtt{D}(n,d)$ and the continuous geometry error $\mathtt{E}(d)$ are the quantities:

	$\displaystyle\mathsf{D}(n,d)$	$\displaystyle:=\left\\|\gamma_{n}\hat{\Delta}^{t_{n}}_{n,d}(\Pi_{d}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla_{d}}(\Pi_{d}S)\right\\|_{L^{2}}^{2}$
	$\displaystyle\mathsf{E}(d)$	$\displaystyle:=\left\\|\frac{1}{\mathrm{vol}(\mathcal{M})}\left(\Delta_{\nabla_{d}}(\Pi_{d}S)-\Delta_{\nabla}S\right)\right\\|_{L^{2}}^{2}\,.$

Remark 19.

The discretization error $\mathsf{D}(n,d)$ captures the error introduced by approximating the connection Laplacian $\Delta_{\nabla_{d}}$ by a point-cloud Laplacian on $n$ points. The continuous geometry error $\mathsf{E}(d)$ is a deterministic quantity that captures the error introduced by moving to the sub-bundle $\mathcal{E}_{d}$ .

Lemma 14.

The continuous geometry error $\mathsf{E}(d)$ converges to zero as $d\to\infty$ .

Proof.

Without loss of generality, by pushing through the global trivialization of Kuiper’s theorem, we may assume without loss of generality that $\mathcal{E}=\mathcal{H}\times\mathcal{M}$ as topological bundles. First, note that the orthogonal projection $\Pi_{d}$ commutes with the Fréchet derivative $D$ on sections, in the sense that:

D(\Pi_{d}S)=\Pi_{d}(DS).

By Proposition 2, for any vector field $X$ , we may write:

\nabla_{X}S=DS(X)+A(X)S

where $A(X)$ is a globally bounded linear operator acting on the fiber $\mathcal{H}$ . We may now compute:

	$\displaystyle(\nabla_{d})_{X}(\Pi_{d}S)$	$\displaystyle=\Pi_{d}\nabla_{X}(\Pi_{d}S)$
		$\displaystyle=\Pi_{d}\left(D\Pi_{d}S(X)+A(X)\Pi_{d}S\right)$
		$\displaystyle=\Pi_{d}DS(X)+\Pi_{d}A(X)\Pi_{d}S)$

Since $\Pi_{d}\to\text{id}$ in the strong operator topology and $DS(X)\in\mathcal{H}$ , we have that $\Pi_{d}DS(X)\to DS(X)$ as $d\to\infty$ . Similarly since $A(X):\mathcal{H}\to\mathcal{H}$ is bounded, we find that for each $x\in\mathcal{M}$ , we have $\Pi_{d}A(X)\Pi_{d}S(x)\to A(X)S(x)$ . Therefore $\nabla_{d}(\Pi_{d}S)(x)\to\nabla S(x)$ . By a similar argument, we may conclude that for a pair of vector fields $X,Y$ , that $(\nabla_{d})_{X}(\nabla_{d})_{Y}(\Pi_{d}S)(x)\to\nabla_{X}\nabla_{Y}S(x).$ Hence using the coordinate form of the connection Laplacian of Lemma 1, we recover that:

\lim_{d\to\infty}\left[\Delta_{\nabla_{d}}S(x)\right]=\Delta_{\nabla}S(x)

for all $x\in\mathcal{M}$ .

We now upgrade to this statement to $L^{2}$ convergence by the dominated convergence theorem. If there is a global bound $K$ such that $\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)-\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K$ for all $x\in\mathcal{M}$ , we may apply the dominated convergence theorem and conclude that $E(d)\to 0$ . To find such a $K$ , we first observe that $\Delta_{\nabla}S$ is a continuous section, and hence $\|\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K_{1}$ is bounded on the compact manifold $\mathcal{M}$ . Next, for a pair of vector fields $X,Y$ , we may use the fact that $\|\Pi_{d}\|_{\text{op}}=1$ and the representation $\nabla=D+A$ to find a $d$ -independent bound on $\left\|\big((\nabla_{d})_{X}(\nabla_{d})_{Y}(\Pi_{d}S)\big)(x)\right\|_{\mathcal{H}}$ . The local coordinate form the connection Laplacian $\Delta_{\nabla_{d}}$ again allows us to conclude that there is a bound $\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)\|_{\mathcal{H}}\leq K_{2}$ for all $x\in\mathcal{M}$ and $d\geq 1$ . The triangle inequality finally allows us to bound:

\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)-\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K_{1}+K_{2}\,.

This completes the proof. ∎

H.5 Proof of Theorem 2

Theorem.

Let $(\mathcal{E},\mathcal{M},\nabla)$ be an infinite-dimensional Hilbert bundle over a closed manifold $\mathcal{M}$ of dimension $m$ equipped with a compatible connection $\nabla$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Let $\{\mathcal{E}_{d}\}_{d}$ be a finite rank approximating sequence for $\mathcal{E}$ with induced connections $\nabla_{d}$ , connection Laplacians $\Delta_{\nabla_{d}}$ , and bandwidth $t$ point cloud Laplacians $\hat{\Delta}_{\mathcal{F}^{t}_{n,d}}$ associated to an iid sampling $\mathcal{X}=\{x_{1},x_{2},\ldots\}$ . Let $\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}$ denote the fiber-wise orthogonal projection map onto $\mathcal{E}_{d}$ . There exists a deterministic increasing sequence $d_{n}$ , depending on the section $S$ , such that

\lim_{n\to\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}_{n,d_{n}}^{t_{n}}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]=0

with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ .

Proof.

Let $\gamma_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}$ . We may easily bound the expected global error in terms of the continuous geometry error and the expected discretization error:

\mathbb{E}_{\mathcal{X}}\left[\left\|\gamma_{n}\hat{\Delta}_{\mathcal{F}_{n,d_{n}}}^{t_{n}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]\leq 2\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,d)]+2\mathsf{E}(d)\,.

By applying Theorem 1 to the bundle $\mathcal{E}_{d}$ , the expected discretization error $\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,d)]\to 0$ as $n\to\infty$ . On the other hand, Lemma 14 ensures that $\mathsf{E}(d)\to 0$ as $d\to\infty$ .

To construct the diagonal sequence, first choose an increasing sequence $p_{i}$ such that $\mathsf{E}(d)<\frac{1}{i}$ for all $d\geq p_{i}$ . Next, choose an increasing sequence $N_{i}$ such that $\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,p_{i})]<\frac{1}{i}$ for each $n\geq N_{i}$ . For each $n\geq 1$ , set $\phi(n):=\max\{i\>\mid\>N_{i}\leq n\}$ . Observe that $\mathsf{E}(p_{\phi(n)})<\frac{1}{\phi(n)}\leq\frac{1}{n}$ since $\phi(n)\geq n$ . On the other hand, $\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,p_{\phi(n)})]<\frac{1}{\phi(n)}\leq\frac{1}{n}$ as well. Therefore setting $d_{n}:=p_{\phi(n)}$ yields a diagonal sequence with the property that

\mathbb{E}_{\mathcal{X}}\left[\left\|\gamma_{n}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]<\frac{4}{n}\,.

This diagonal subsequence is deterministic as both the continuous geometry error and the expected discretization error are deterministic. ∎

H.6 Key Lemmas for Corollary 2.1

Lemma 15.

Let $(\mathcal{E},\mathcal{M},\nabla)$ be a smooth infinite-dimensional Hilbert bundle over a closed manifold $\mathcal{M}$ of dimension $m$ equipped with a compatible connection $\nabla$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Let $\{\mathcal{E}_{d}\}_{d}$ be a finite rank approximating sequence for $\mathcal{E}$ with induced connections $\nabla_{d}$ , connection Laplacians $\Delta_{\nabla_{d}}$ , and bandwidth $t$ point cloud Laplacians $\hat{\Delta}_{\mathcal{F}^{t}_{n,d}}$ associated to an iid sampling $\mathcal{X}=\{x_{1},x_{2},\ldots\}$ . Let $\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}$ denote the fiber-wise orthogonal projection map onto $\mathcal{E}_{d}$ . Let $\{d_{n}\}$ be the diagonal sequence induced by Theorem 2. Let $\tilde{\Delta}_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}\Pi_{d_{n}}$ with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ . Similarly, let $\tilde{\Delta}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}$ . Finally, let $g:\mathbb{R}\to\mathbb{R}$ be a bounded continuous function.

Under the Borel functional calculus (Appendix G.4), we have MSE convergence:

\mathbb{E}\left[\left\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\right\|_{L^{2}}^{2}\right]\to 0\,.

Proof.

Observe that $\tilde{\Delta}$ and each $\tilde{\Delta}_{n}$ are self-adjoint unbounded operators on $L^{2}(\mathcal{E};\mathcal{M})$ . We begin by showing there is a common core $\mathcal{D}\subseteq L^{2}(\mathcal{E};\mathcal{M})$ and a subsequence $n_{i}$ for which $\tilde{\Delta}_{n_{i}}S\to\tilde{\Delta}S$ in $L^{2}(\mathcal{E};\mathcal{M})$ .

Take $\mathcal{D}$ to be any countable dense subset of $C^{\infty}(\mathcal{M},\mathcal{E})$ . Since $\mathcal{E}$ has separable fibers, such a countable dense subset necessarily exists. Moreover, $\tilde{\Delta}$ and $\tilde{\Delta}_{n}$ are all defined on $\mathcal{D}$ . This $\mathcal{D}$ shall be our common core.

Treat $\tilde{\Delta}_{n}S$ as an $L^{2}(\mathcal{M},\mathcal{E})$ -valued random variable. For each $S\in\mathcal{D}$ , Theorem 2 ensures that $\tilde{\Delta}_{n}S\to\tilde{\Delta}S$ in mean square error. It immediately follows that $\tilde{\Delta}_{n}S\xrightarrow{\mathbb{P}_{\mathcal{X}}}\tilde{\Delta}S$ in probability with respect to the measure from which the sampling $\mathcal{X}$ is drawn.

Enumerate $\mathcal{D}=\{S_{1},S_{2},\ldots\}$ . Since convergence in probability implies almost sure convergence on a subsequence, we may inductively construct a doubly-indexed sequence of indices $N^{a}_{b}$ such that the following properties hold:

1.

For each $a$ , the sequence $\{N_{b}^{a+1}\}_{b}$ is a subsequence of $\{N_{b}^{a}\}_{b}$ ;
2.

Along the sequence $\{N^{a}_{b}\}_{b}$ , we have almost sure convergence $\tilde{\Delta}_{N^{a}_{b}}S_{a}\xrightarrow{\mathrm{a.s.}}\tilde{\Delta}S_{a}$ as $b\to\infty$ .

Take the diagonal sequence $n_{i}:=N^{i}_{i}$ . Along $n_{i}$ , we have that $\tilde{\Delta}_{n_{i}}S\xrightarrow{\mathrm{a.s.}}\tilde{\Delta}S$ as $i\to\infty$ for all $S\in\mathcal{D}$ . Now applying Theorems VIII.25(a) and VIII.20(b) of [82], we may conclude that:

\|g(\tilde{\Delta}_{n_{i}})S-g(\tilde{\Delta})S\|_{L^{2}}\xrightarrow{\mathrm{a.s.}}0

almost surely for each $S$ .

Notice that the previous argument can not only be applied to the sequence $n=\{1,2,3,...\}$ , but also along any subsequence of $\{n\}_{n}$ . Since any subsequence of $\{n\}_{n}$ therefore has an almost surely convergent sub-subsequence, we may conclude that for the original sequence $n$ , for each section $S$ (not necessarily in $\mathcal{D}$ ), we have convergence in probability:

\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\|_{L^{2}}\xrightarrow{\mathbb{P}_{\mathcal{X}}}0

in probability with respect to the sampling $\mathcal{X}$ as $n\to\infty$ .

Finally, by the spectral calculus, since $g:\mathbb{R}\to\mathbb{R}$ is bounded by some $B$ , we may bound $\|g(\tilde{\Delta}_{n})S\|_{L^{2}}\leq B\|S\|_{L^{2}}$ . Similarly, $|g(\tilde{\Delta})S\|_{L^{2}}\leq B\|S\|_{L^{2}}$ . It follows that for each $n$ ,

\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\|_{L^{2}}^{2}\leq 4B^{2}

for all $n$ . Hence the dominated convergence theorem admits an MSE upgrade to the desired conclusion:

\mathbb{E}\left[\left\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\right\|_{L^{2}}^{2}\right]\to 0\,.

∎

H.7 Proof of Corollary 2.1

Corollary.

Under the hypotheses of Theorem 2, let $\{d_{n}\}_{n}$ be the constructed deterministic diagonal sequence, and $L\in\mathbb{N}$ . Let $\sigma$ be a fiber-wise nonlinearity that is $C_{\sigma}$ -Lipschitz in the corresponding fiber norms. For bounded continuous filters $g^{0},\ldots,g^{L-1}\in\mathcal{W}$ , consider the continuous and sampled architectures:

	$\displaystyle S^{\ell+1}$	$\displaystyle:=\sigma\left(g^{\ell}\left(\tilde{\Delta}\right)S^{\ell}\right)$
	$\displaystyle S_{n}^{\ell+1}$	$\displaystyle:=\sigma\left(g^{\ell}\left(\tilde{\Delta}_{n}\right)S_{n}^{\ell}\right)\,,$

with initializations $S^{0}:=S$ and $S_{n}^{0}:=\Pi_{d_{n}}S$ , where $\tilde{\Delta}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}$ and $\tilde{\Delta}_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}$ .

Then, the output of the discrete architecture converges in mean square to the output of the continuous architecture:

\lim_{n\to\infty}\mathbb{E}\left[\left\|S_{n}^{L}-S^{L}\right\|_{L^{2}(\mathcal{M};\mathcal{E})}^{2}\right]=0\,.

Proof.

We proceed by induction on the layers. Let $e_{n,\ell}$ and $\delta_{n,\ell}$ denote the signal error and spectral filter error at layer $\ell$ respectively:

	$\displaystyle e_{n,\ell}$	$\displaystyle:=\left\\|S_{n}^{\ell}-S^{\ell}\right\\|_{L^{2}}$
	$\displaystyle\delta_{n,\ell}$	$\displaystyle:=\left\\|g_{\ell}(\tilde{\Delta}_{n})(\Pi_{d_{n}}S^{\ell})-g_{\ell}(\tilde{\Delta})S^{\ell}\right\\|_{L^{2}}\,.$

Let $M_{\ell}:=\sup_{x\in\mathbb{R}}\|g_{\ell}(x)\|$ . By the Lipschitz continuity of the nonlinearity $\sigma$ , the triangle inequality yields the pathwise recursive bound:

e_{n,\ell+1}\leq C_{\sigma}(M_{\ell}e_{n,\ell}+\delta_{n,\ell})\,.

Iterating this inequality over $L$ layers expands to:

e_{n,L}\leq\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)e_{n,0}+\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}\delta_{n,q}\,.

By Theorem 2, the initialized signal error $e_{n,0}\to 0$ in mean square. Furthermore, by Lemma 15, $\delta_{n,q}\to 0$ in mean square as well for each $q$ .

Taking the expectation of the squared recursive bound and applying the Cauchy-Schwarz inequality to the finite sum isolates the individual mean square limits. Hence, as $n\to 0$ , we have that the total error satisfies:

\lim_{n\to\infty}\mathbb{E}[e_{n,L}^{2}]=0\,.

In particular, we have that as sampling density goes to infinity, in MSE ,

\Omega(\mathcal{F}_{n,d_{n}},\hat{\Delta}_{\mathcal{F}_{n,d_{n}}},\mathcal{W},\sigma)\to\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma)\,.

∎

H.8 Proof of Corollary 2.2

Corollary.

Let $(\mathcal{E},\mathcal{M},\nabla)$ be a smooth Hilbert bundle over a closed manifold $\mathcal{M}$ of dimension $m$ equipped with a compatible connection $\nabla$ . Fix a section $S\in C^{4}(\mathcal{M},\mathcal{E})$ . Let $\{\mathcal{E}_{d}\}_{d}$ be a finite rank approximating sequence for $\mathcal{E}$ with induced connections $\nabla_{d}$ , and connection Laplacians $\Delta_{\nabla_{d}}$ . Let $\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d}$ denote the fiber-wise orthogonal projection map onto $\mathcal{E}_{d}$ . Let $\mathcal{X}=\{x_{1},x_{2},\ldots\}$ and $\mathcal{Y}=\{y_{1},y_{2},\ldots\}$ be a pair of independent iid samplings of points on the manifold. Denote the bandwidth $t$ point cloud Laplacians associated to these distinct samplings by $\hat{\Delta}_{\mathcal{F}^{t}_{\mathcal{X}_{n},d}}$ and $\hat{\Delta}_{\mathcal{F}^{t}_{\mathcal{Y}_{n},d}}$ respectively. Let $\{d_{n}\}$ be a diagonal sequence such that the conclusion of Theorem 2 holds for both samplings. Let $\tilde{\Delta}_{n}^{\mathcal{X}}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n,d_{n}}}}\Pi_{d_{n}}$ with bandwidth $t_{n}=n^{-\frac{1}{m+2+\alpha}}$ , $\alpha>0$ , and similar for $\tilde{\Delta}_{n}^{\mathcal{Y}}$ .

Let $L\in\mathbb{N}$ be a network depth, and $\sigma$ be a fiber-wise nonlinearity that is $C_{\sigma}$ -Lipschitz in the corresponding fiber norms. For bounded continuous filters $g_{0},\ldots,g_{L-1}\in\mathcal{W}$ , consider the continuous and sampled architectures:

	$\displaystyle S^{\ell+1}$	$\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}\right)S^{\ell}\right)$
	$\displaystyle S_{n}^{\mathcal{X},\ell+1}$	$\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}^{\mathcal{X}}_{n}\right)S_{n}^{\mathcal{X},\ell}\right)$
	$\displaystyle S_{n}^{\mathcal{Y},\ell+1}$	$\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}_{n}^{\mathcal{Y}}\right)S_{n}^{\mathcal{Y},\ell}\right)\,,$

with initializations $S^{0}:=S$ and $S_{n}^{\mathcal{X},0},S_{n}^{\mathcal{Y},0}:=\Pi_{d_{n}}S$ . Under these hypotheses and notation, one may obtain a MSE convergence result:

\lim_{n\to\infty}\mathbb{E}\left[\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}^{2}\right]=0\,.

Further, one may derive a quantitative bound for the $L^{2}$ disagreement $\|S_{n}^{\mathcal{X},L}-S_{n}^{{\mathcal{Y}},L}\|_{L^{2}}$ in terms of sample-indpendent quantities.

Proof.

One may bound:

\ \left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}^{2}\leq\ 2\left\|S_{n}^{\mathcal{X},L}-S^{L}\right\|_{L^{2}}^{2}+2\left\|S_{n}^{\mathcal{Y},L}-S^{L}\right\|_{L^{2}}^{2}

Applying Corollary 2.1 to each sampling separately yields the MSE convergence. In particular, we have that

\lim_{n\to\infty}\mathbb{E}_{\mathcal{X},\mathcal{Y{}}}\left[\left\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\right\|_{L^{2}}^{2}\right]=0\,.

To derive a quantitative bound, introduce the per-sampling signal error and spectral filter error at level $\ell$ by:

	$\displaystyle e^{(-)}_{n,\ell}$	$\displaystyle:=\left\\|S_{n}^{(-),\ell}-S^{\ell}\right\\|_{L^{2}}$
	$\displaystyle\delta^{(-)}_{n,\ell}$	$\displaystyle:=\left\\|h_{\ell}(\tilde{\Delta}^{(-)}_{n})(\Pi_{d_{n}}S^{\ell})-h_{\ell}(\tilde{\Delta})S^{\ell}\right\\|_{L^{2}}\,.$

Apply the triangle inequality and the layer-wise recursive bounds of the proof of Corollary 2.1 to establish:

\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}\leq\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)(e^{\mathcal{X}}_{n,0}+e^{\mathcal{Y}}_{n,0})+\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}(\delta^{\mathcal{X}}_{n,q}+\delta^{\mathcal{Y}}_{n,q})\,.

The level-zero signal error is sampling independent, and bounded above by $\|S\|_{L^{2}}$ . On the other hand, we may further apply the Borel functional calculus to bound each spectral filter error by:

\displaystyle\delta_{n,q}^{(-)}

\displaystyle\leq 2M_{q}\|S^{q}\|_{L^{2}}\,.

We hence conclude a sample-independent bound:

\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}\leq 2\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)\|S\|_{L^{2}}+2\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}M\|S^{q}\|_{L^{2}}\,.

∎

	$\displaystyle\left\|\int_{\mathcal{M}}e^{-\frac{\\|y-p\\|}{4t}}F(y)\,d\mu(y)-\int_{B}e^{-\frac{\\|y-p\\|}{4t}}F(y)\,d\mu(y)\right\|$	$\displaystyle\leq\int_{\mathcal{M}\setminus B}\left\\|e^{-\frac{\\|y-p\\|}{4t}}F(y)\right\\|\,d\mu(y)$
		$\displaystyle\leq\int_{\mathcal{M}\setminus B}e^{-\frac{\\|y-p\\|}{4t}}\left\\|S(y)\right\\|\,d\mu(y)$
		$\displaystyle\leq MK\exp\left(-\frac{d}{4t}\right)$
		$\displaystyle=o(t^{a}).$

	$\displaystyle\\|B_{t}\\|$	$\displaystyle\leq\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)\\|\tilde{F}(0)-\tilde{F}(x)\\|\left[O(\\|x\\|^{2})\right]\,dx$
		$\displaystyle\leq\frac{K/\mathrm{vol}(\mathcal{M})}{t}\frac{1}{(4\pi t)^{m/2}}\int_{\tilde{B}}\,\\|x\\|^{3}\exp\left(-\frac{\\|x\\|^{2}}{4t}\right)$
		$\displaystyle=\frac{1}{t}O\left(t^{3/2}\right)$
		$\displaystyle=O\left(\sqrt{t}\right)\,.$

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Abstract

1 Introduction

2 Preliminaries

Example 1.

Example 2.

Example 3.

3 Hilbert Bundle Filters and Neural Networks

Definition 1 (Hilbert bundle convolutional filter).

Definition 2 (Hilbert bundle convolutional neural network).

4 Discretized HilbNets via Cellular Sheaves

Definition 3 (Hilbert Cellular Sheaf from a Hilbert Bundle).

Example 4.

Example 5.

Definition 4 (Hilbert Sheaf Laplacian).

Proposition 1.

Definition 5 ((n,d)(n,d)-Hilbert bundle convolutional neural network).

Example 6.

5 Theoretical Convergence Guarantees

Definition 6.

Theorem 1.

Theorem 2 (Finite-Rank Approximation).

Corollary 2.1 (Convergence in Architecture).

Corollary 2.2.

6 Experimental Results

7 Conclusions

References

Appendix Contents

Appendix A Extended Related Works

Appendix B Broader Impact, Future Directions and Limitations

Appendix C Existing Convolutional Architectures as Actualizations of HilbNets

C.1 Universality of HilbNets

C.1.1 CNNs, GNNs, and Sheaf NNs

C.1.2 Equivariant CNNs and GNNs

C.1.3 Spatio-Temporal GNNs

Appendix D Practical Implementations of Parallel Transport

Appendix E Parallel Transport Parametrizations

E.1 From bundle transports to network-sheaf restrictions

E.2 Transport hypothesis classes

Frozen identity.

Free orthogonal transports.

Circulant or time-stationary transports.

E.3 Parameter counts

Appendix F Additional Experimental Details

F.1 Synthetic experiments: the statistical bundle over centered Gaussians

F.1.1 The bundle

F.1.2 Levi-Civita connection and closed-form parallel transport

F.1.3 Cholesky rescaling

F.1.4 Sample construction

F.1.5 Spectral stability under sampling density increase

Aggregate discrepancy.

Worst-case discrepancy.

Free O​(d)O(d).

Restricted classes and analytical projections.

Empirical vs. theoretical plateaus.

F.1.6 Hyperparameters

F.2 Traffic forecasting: experimental details

F.2.1 Datasets

F.2.2 Task formulation

F.2.3 Model variants and baselines

F.2.4 Detailed results

Value of graph structure.

Value of learned transports.

Free vs. circulant.

Comparison with external baselines.

F.2.5 Hyperparameters

Appendix G Mathematical Background

G.1 Hilbert Bundles

G.1.1 Banach and Hilbert manifolds

Definition 7.

Definition 8.

Remark 1.

G.1.2 Smooth bundles

Definition 9 (Smooth Banach and Hilbert bundles).

Remark 2.

Definition 10 (Smooth sections of a Banach bundle).

Definition 11 (L2L^{2}-Sections of a Banach bundle).

Remark 3.

G.1.3 Connections

Consistent Geometric Deep Learning
via Hilbert Bundles and Cellular Sheaves

Definition 5 ( $(n,d)$ -Hilbert bundle convolutional neural network).

Free $O(d)$ .

Definition 11 ( $L^{2}$ -Sections of a Banach bundle).