arXiv:2605.06395 · cs.LG · uncurated · rendered via ar5iv

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Title and authors will populate once this paper is indexed.
This paper is rendered from ar5iv. Reproductions and verdicts are not yet available — but you can leave a comment below.
[2605.06395] Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Consistent Geometric Deep Learning
via Hilbert Bundles and Cellular Sheaves

Kartik Tandon    Julian Gould    Tanishq Bhatia    [0.3em] Francesca Dominici    Alejandro Ribeiro    Claudio Battiloro    [0.5em] University of Pennsylvania    Sakana AI    Northeastern University    Harvard University    [0.3em] Equal contribution    [0.1em] Corresponding authors: ktandon@sas.upenn.edu, cbattiloro@hsph.harvard.edu
Abstract

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as HilbNets. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin & Niyogi [14] convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

1 Introduction

Over the past few years, advances in deep learning have delivered state-of-the-art performance across many areas, driven by increasingly expressive architectures and corresponding gains in both theory and practice. A major contributor to this success, though not the only one, has been the rise of Convolutional Neural Networks (CNNs) [63]. CNNs have shown outstanding results in settings ranging from image recognition [60] to speech processing [1]. At their core, CNNs rely on filters leveraging the regular (often metric) organization of common signal types, such as spatial grids. In contrast, many modern datasets live on irregular, non-Euclidean domains, including social networks for detection and recommendation [2] or point clouds for shape segmentation [107], to name only a few. Such structured data can be represented by richer mathematical objects, among which networks and manifolds are prominent. Motivated by this, the intuition behind CNNs has been generalized to graph convolutional neural networks (GCNs) [86, 40, 59] and extended to many other settings, e.g. simplicial complexes [10, 17, 6], cell complexes [43, 16, 77], order lattices [84], and manifolds [102, 34, 87, 11, 27]. Nevertheless, existing works do not address convolutional filtering of infinite-dimensional signals on manifolds, despite such data being ubiquitous in practice, from time series and spatiotemporal fields arising in sensing, robotics, and climate science to distributional and measure-valued representations common in modern learning systems [54].

To address this gap, we adopt a bundle viewpoint. Informally, a bundle \mathcal{E} over a base manifold \mathcal{M} is a consistent assignment to each point xx\in\mathcal{M} of a space x\mathcal{E}_{x}, called a fiber. A section is a map that picks an element S(x)xS(x)\in\mathcal{E}_{x} at every point. In other words, signals supported on manifolds can be seen as sections, e.g., scalar manifold signals correspond to x\mathcal{E}_{x}\simeq\mathbb{R} [102], or tangent bundle signals correspond to xTx\mathcal{E}_{x}\simeq T_{x}\mathcal{M} [11]. In this work, we develop a convolutional learning framework operating over Hilbert bundles, i.e., bundles whose fibers are (possibly) infinite-dimensional Hilbert spaces. We design a bundle-theoretic convolutional learning framework and, to make it implementable, we draw the first rigorous connection between Hilbert bundles and Hilbert Cellular Sheaves, generalized graph structures whose nodes and edges carry infinite-dimensional signals along with consistency rules.

Refer to caption
Figure 1: Overview of the HilbNets framework. A HilbNet is a convolutional neural network processing infinite-dimensional signals supported on \mathcal{M} (e.g., time-series or distributions over curved domains). The convolutional operator is the connection Laplacian Δ\Delta_{\nabla}. To make HilbNets implementable, we first sample nn points 𝒳n\mathcal{X}_{n} from \mathcal{M} to obtain a Hilbert Cellular Sheaf n\mathcal{F}_{n} with associated Hilbert Sheaf Laplacian Δn\Delta_{\mathcal{F}_{n}}. We then take dd samples of the signals to get vectors 𝐬n,dnd\mathbf{s}_{n,d}\in\mathbb{R}^{nd} living on a network sheaf 𝒳n,d\mathcal{F}_{\mathcal{X}_{n,d}}, i.e., a generalized matrix-weighted graph, with associated network Sheaf Laplacian Δn,dnd×nd\Delta_{\mathcal{F}_{n,d}}\in\mathbb{R}^{nd\times nd}. Discretized HilbNets are then Sheaf Neural Networks that take as inputs 𝐒n,d\mathbf{S}_{n,d} and Δn,d\Delta_{\mathcal{F}_{n,d}}. We prove that Discretized HilbNets converge to the underlying HilbNet as the number of manifold and signal samples goes to infinity.

Related Works. The connection between continuous domains, such as manifolds and bundles, and discrete structures, such as graphs and cellular sheaves, first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional Riemannian manifolds [38]. The manifold hypothesis underpins several modern spectral graph methods, e.g., nonlinear dimensionality-reduction, clustering, interpretability, and learning algorithms that exploit latent geometric structures. The renowned work of Belkin and Niyogi [14] proved that, assuming access to a finite point cloud sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.

The work in [14], and related results, e.g., [92, 93], have been used, directly or indirectly, to design learning systems over manifolds and networks [103, 67, 20, 74, 11, 56]. Despite the diversity of such systems, these models all assume finite-dimensional fibers and therefore do not directly address learning with infinite-dimensional manifold signals. The main technical reason behind this gap is the lack of an extension of the convergence result in [14] to bundles with infinite-dimensional fibers.

A line of related works of interest comes from cellular sheaf theory. Cellular sheaves are combinatorial instances of sheaves introduced in [90] and later rediscovered in [29]. In [18, 50, 7, 39, 37, 78], neural networks operating on finite-dimensional cellular sheaves over graphs, referred to as network sheaves, are presented, generalizing graph neural networks by, intuitively, replacing scalar edge weights with learned or structured matrix weights. Recently, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on network sheaves built from manifold samples. For an extended treatment of related work, see Appendix A.

Contribution. In this work, we first define a convolution operation over a Hilbert bundle through its associated connection Laplacian. This convolution extends Laplacian-based convolutions on tangent bundles [11], manifolds [103], and graphs [91, 41], as well as standard time convolutions. Using the Borel functional calculus, we then define Hilbert bundle convolutional filters for infinite-dimensional manifold signals. These filters are general and expressive, and can be instantiated through suitable spectral responses. We then introduce HilbNets, deep convolutional architectures whose layers stack Hilbert bundle filters and pointwise nonlinearities. HilbNets are continuous models and are therefore not directly implementable. To address this, we provide a principled discretization of the manifold domain by sampling points and showing that the induced structure is a Hilbert cellular sheaf over an undirected graph. The corresponding sheaf Laplacian combines scalar edge weights, obtained from the sampled base manifold, with parallel transport maps associated with the bundle geometry or learned from data. We prove that this sheaf Laplacian converges in probability to the connection Laplacian as the sampling density increases, yielding the first extension of the classical convergence result of [14] to the infinite-dimensional bundle setting. We then discretize the signals themselves to obtain an implementable architecture, show that discretized HilbNets are novel instances of network sheaf neural networks, and prove that they converge to the corresponding continuous architectures as both the manifold and signal sampling densities increase. Moreover, we show that discretized HilbNets are transferable across different samplings of the same underlying bundle, providing resolution consistency guarantees for learning. Finally, we validate HilbNets on a synthetic transport recovery task and on real-world traffic forecasting tasks, comparing them against baselines with different inductive biases in order to isolate the benefits of the bundle formulation. The potential impact of this work extends well beyond the definition of the HilbNet architecture. See Appendix B for a detailed discussion of broader impact and future directions, and Fig. 1 for an overview.

2 Preliminaries

Signals on Manifolds. Given a manifold \mathcal{M}, a vector-valued signal is a square-integrable function SL2(,n)S\in L^{2}(\mathcal{M},\mathbb{R}^{n}). Certain vector-valued signals on \mathcal{M} may possess the richer structure of a vector field, i.e., they are sections of the tangent bundle TT\mathcal{M} of \mathcal{M} and thus elements of L2(,T)L^{2}(\mathcal{M},T\mathcal{M}). More generally, we may consider signals that are L2L^{2}-sections of an arbitrary bundle \mathcal{E}. A bundle is called trivial when, for a generic fiber 𝒱\mathcal{V} , it can be written as a product =×𝒱\mathcal{E}=\mathcal{M}\times\mathcal{V}. In this setting, L2(,n)L^{2}(\mathcal{M},\mathbb{R}^{n}) may be understood as the space of sections of the trivial bundle =×n\mathcal{E}=\mathcal{M}\times\mathbb{R}^{n}.

Refer to caption
Figure 2: Visualization of the effect of the choice of underlying connection for generating heat flows of vector fields on the sphere 𝕊2\mathbb{S}^{2}. Left: Generated by considering the standard Levi-Civita connection on T𝕊2T\mathbb{S}^{2}, and corresponding Laplacian Δ\Delta_{\nabla}. Right: Generated by allowing a more general connection \nabla that allows for torsion anisotropy and considering the corresponding connection Laplacian Δ\Delta_{\nabla}.

Consider now the case where the signal is ‘infinite-dimensional’, for instance, representing a time series recorded at each point xx\in\mathcal{M}. While this is usually considered as a function S:×nS:\mathcal{M}\times\mathbb{R}\to\mathbb{R}^{n}, it may instead be more richly understood as a section of a Hilbert bundle, i.e., a bundle whose fibers are Hilbert spaces. As we will see, Hilbert bundles provide a principled and versatile approach to incorporating structural properties of infinite-dimensional data.

Example 1.

In physics, Hilbert bundles often arise naturally when considering global geometric properties of quantum mechanical systems [4].

Example 2.

In information geometry, the key objects of study are manifolds \mathcal{M} given by the underlying parameters of some family of data distributions. This manifold is then equipped with a Riemannian structure by either the Otto-Wasserstein or Fisher-Rao metric, the latter of which locally recovers KL divergence. The proper analogue of the tangent bundle in this setting is a Hilbert bundle [72].

Convolution, Heat Equation, and Connection Laplacian. Geometric signal processing and deep learning [66, 21] traditionally aim to develop convolutional filters and neural networks designed to respect the underlying geometry of the signals of interest. The relevant convolutional operators can usually be realized as a connection Laplacian Δ:L2(,)L2(,)\Delta_{\nabla}:L^{2}(\mathcal{M},\mathcal{E})\to L^{2}(\mathcal{M},\mathcal{E}) operator realized from a connection \nabla. For instance, for the tangent bundle over the circle T𝕊1T\mathbb{S}^{1}, the eigenfunctions of Δ\Delta_{\nabla}, with \nabla the Levi-Civita connection, recover the usual Fourier basis. Similarly, the eigenfunctions of Δ\Delta_{\nabla} for the tangent bundle of the sphere T𝕊2T\mathbb{S}^{2} recover spherical harmonics. Thus, convolutions with the connection Laplacian may be understood as generalized Fourier transforms in the spectral domain. In the spatial domain, it can be seen as performing a geometry-aware ‘local averaging’ of a signal over fibers. Formally, the connection Laplacian is the generator of the heat equation in L2(,)L^{2}(\mathcal{M},\mathcal{E}),

𝐔(x,t)t=Δ𝐔(x,t),\frac{\partial\mathbf{U}(x,t)}{\partial t}=-\Delta_{\nabla}\mathbf{U}(x,t), (1)

where 𝐔(x,t)\mathbf{U}(x,t) is the distribution of heat at x\mathcal{E}_{x} for xx\in\mathcal{M} at time t+t\in\mathbb{R}_{+}. A subtlety of note is that for non-Euclidean spaces, there is typically no canonical identification between fibers x\mathcal{E}_{x} and y\mathcal{E}_{y}. Intuitively, the connection \nabla precisely encodes a globally coherent notion of transport between fibers. That is, inducing parallel transport maps Pγ:γ(0)γ(1)P_{\gamma}:\mathcal{E}_{\gamma(0)}\to\mathcal{E}_{\gamma(1)} for a path γ\gamma\subset\mathcal{M}, allowing us to compare elements across fibers along this path. More formally, the connection is used to define a first-order ODE whose solution is given by parallel transport (see Appendix G.1.3). The connection Laplacian is then the self-adjoint operator Δ:=\Delta_{\nabla}:=\nabla^{*}\nabla, now more clearly understandable as a ‘local weighted average’ over fibers with ‘weights’ corresponding to our choice of parallel transport. A more rigorous introduction to the relevant mathematical background is provided in Appendix G.

Example 3.

As seen in Fig. 2, the choice of connection can be used to emphasize aspects of the geometry that may be relevant for a particular task. For instance, most PDE-based approaches to color-image regularization can be realized as heat equations for a suitable choice of connection [8].

3 Hilbert Bundle Filters and Neural Networks

In this section, we develop a convolutional learning framework for infinite-dimensional data, such as time series or probability distributions, indexed by a manifold \mathcal{M}. Core objects are Hilbert bundles.

Hilbert Bundles. Given a closed Riemannian manifold \mathcal{M}, a Hilbert bundle \mathcal{E} over \mathcal{M} is a bundle whose potentially infinite-dimensional fibers are separable Hilbert spaces over \mathbb{R}. The assumption of real, instead of complex, Hilbert spaces is not essential to our analysis, and is made only for the sake of exposition. As mentioned in Section 2, a Hilbert Bundle signal is then an L2L^{2}-section S:S:\mathcal{M}\to\mathcal{E}. Integration of sections in this setting should be understood in the Bochner integral sense, a generalized notion of integration for functions whose values lie in a Hilbert space rather than in n\mathbb{R}^{n}. In finite dimensions, it reduces to the standard Lebesgue integral. Given fibers x\mathcal{H}_{x} and y\mathcal{H}_{y} of the Hilbert bundle \mathcal{E}, we consider unitary parallel transport operators Pxy:xyP_{x\to y}:\mathcal{H}_{x}\to\mathcal{H}_{y}. As before, a globally compatible collection of such transport operators determines a connection \nabla, with the subtlety that derivatives of sections must now be understood in the Fréchet sense. Intuitively, Fréchet differentiability is the infinite-dimensional analogue of ordinary differentiability: it asks that a section admit a best linear approximation under small perturbations, but where the linear approximation acts between Hilbert spaces. We therefore refer to this construction as a Fréchet connection, which recovers the usual notion of connection and covariant derivative when restricted to finite-dimensional bundles. As before, we obtain a self-adjoint operator Δ:=\Delta_{\nabla}:=\nabla^{*}\nabla on L2(,)L^{2}(\mathcal{M},\mathcal{E}). Unlike the finite-dimensional case, however, this operator need not be compact and thus need not possess a discrete spectrum. As such, care must be taken when adapting classical arguments that involve spectral properties of the Laplacian to the Hilbert-bundle setting. Formal definitions of Hilbert bundles and Fréchet connections are provided in Appendix G. For a triple (,,)(\mathcal{M},\mathcal{E},\nabla), where \nabla is a choice of Fréchet connection on the Hilbert bundle \mathcal{E} over the manifold \mathcal{M}, we now wish to construct a general notion of a ‘filtering’ operation using the connection Laplacian Δ\Delta_{\nabla}. In finite-dimensional or compact settings, filters are often defined by applying a function directly to the eigenvalues of the Laplacian. In our setting, the appropriate analogue of eigenvalue-by-eigenvalue filtering is furnished by the Borel functional calculus, which allows us to apply a filter to a self-adjoint operator by instead integrating over its spectral measure. See Appendix G.4 for details.

Definition 1 (Hilbert bundle convolutional filter).

A convolutional filter is specified by a bounded compactly supported Borel function gLc()g\in L_{c}^{\infty}(\mathbb{R}). The filtering of a signal SL2(,)S\in L^{2}(\mathcal{M},\mathcal{E}) is then its convolution Δ\star_{\Delta_{\nabla}} with gg defined as gΔS:=g(Δ)S, where g(Δ):L2(,)L2(,)g\star_{\Delta_{\nabla}}S:=g(\Delta_{\nabla})S,\textrm{ where }g(\Delta_{\nabla}):L^{2}(\mathcal{M},\mathcal{E})\to L^{2}(\mathcal{M},\mathcal{E}) is the bounded linear operator obtained by applying gg to Δ\Delta_{\nabla} through the Borel functional calculus.

In this sense, gg is the learnable frequency response, as in spectral graph neural filters [41], except we now use the spectral measure of the connection Laplacian acting on Hilbert bundle-valued signals.

Definition 2 (Hilbert bundle convolutional neural network).

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle. A Hilbert bundle convolutional neural network, or HilbNet, is specified by a filter bank 𝒲={gu,q},u,q\mathcal{W}=\{g^{\ell}_{u,q}\}_{\ell,u,q} with gu,qLc()g^{\ell}_{u,q}\in L_{c}^{\infty}(\mathbb{R}), and a Lipschitz continuous nonlinear activation σ:\sigma:\mathbb{R}\to\mathbb{R}. Given input signals S1,,SF0L2(,)S_{1},...,S_{F_{0}}\in L^{2}(\mathcal{M},\mathcal{E}), the LL-layer network output is obtained by the recursion

Su+1=σ(q=1Fgu,q(Δ)Sq),=0,,L1,Sq0=Sq,S^{\ell+1}_{u}=\sigma\left(\sum_{q=1}^{F_{\ell}}g^{\ell}_{u,q}(\Delta_{\nabla})S^{\ell}_{q}\right),\qquad\ell=0,\dots,L-1,\qquad S^{0}_{q}=S_{q}, (2)

where σ\sigma is applied pointwise in each fiber.

We concisely denote a HilbNet with Ω(,Δ,𝒲,σ)\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma). Similarly to the finite-dimensional case, a nonlinear activation σ:\sigma:\mathbb{R}\to\mathbb{R} with σ(0)=0\sigma(0)=0 extends to an operator on the L2L^{2} section by simply picking a basis and then applying σ\sigma to each coordinate with respect to the chosen basis. It is straightforward to check that, for each {0,,L}\ell\in\{0,\ldots,L\}, the layer signal SS^{\ell} remains an L2L^{2} section.

4 Discretized HilbNets via Cellular Sheaves

HilbNets are continuous architectures that cannot be implemented directly in practice. Moreover, we typically do not have access to the true bundle and connection structure, but only to a point cloud or graph sampled from the underlying manifold \mathcal{M}, together with samples of the signal at each point or node. In this section, we first analyze the Hilbert cellular sheaf induced by spatial, i.e., manifold-level, sampling. We then further discretize the fibers, i.e., the signal domain itself, obtaining a finite rank network sheaf. This two-stage sampling is the basis of our consistency theory presented in Section 5. It allows us to prove that the fully discrete (thus, implementable) Laplacian and HilbNet converge to their infinite-dimensional counterparts, and hence that learning is consistent across scales.

Manifold Sampling. A generalized viewpoint to the theory of bundles is given by the language of sheaves, mathematical structures initially introduced by Jean Leray while a prisoner of war [65]. The functoriality of sheaves lends them particularly well to the type of principled discretization of geometric structures that we are interested in. In particular, we consider a Hilbert-space valued version of cellular sheaves on graphs, as introduced in [44]. Intuitively, they can be understood as generalized graph structures with signals valued in Hilbert spaces along with edge-wise coupling rules. For a more thorough introduction to cellular sheaves, see Appendix G.5.

In this work, our primary interest is in Hilbert sheaves that represent discretizations of the structure of a Hilbert bundle over a manifold. In particular, we desire a spatial discretization such that we can recover an appropriate discrete analogueof the Hilbert bundle’s connection Laplacian. Formally, given an iid random sample 𝒳n\mathcal{X}_{n}\subset\mathcal{M} from the uniform distribution (see Def. 23), we have the following.

Definition 3 (Hilbert Cellular Sheaf from a Hilbert Bundle).

For a given Hilbert bundle (,,)(\mathcal{M},\mathcal{E},\nabla) with sampled points 𝒳n={x1,,xn}\mathcal{X}_{n}=\{x_{1},\dots,x_{n}\}\subset\mathcal{M}, fix a geodesic γij\gamma_{ij} between xix_{i} and xjx_{j}, for all i<ji<j. Further, let mγijm_{\gamma_{ij}} denote the midpoint of this geodesic. Consider the graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E) with an undirected edge eije_{ij} between xix_{i} and xjx_{j}, for each i<ji<j. The associated Hilbert cellular sheaf nt\mathcal{F}_{n}^{t} on GnG_{n} with bandwidth parameter tt is given by the following assignments:

  • The Hilbert space nt(xi):=xi\mathcal{F}_{n}^{t}(x_{i}):=\mathcal{E}_{x_{i}} for each xi𝒳nx_{i}\in\mathcal{X}_{n}, referred to as the node stalk over xi𝒳nx_{i}\in\mathcal{X}_{n}.

  • The Hilbert space nt(eij):=mγij\mathcal{F}_{n}^{t}(e_{ij}):=\mathcal{E}_{m_{\gamma_{ij}}} for each eijEe_{ij}\in E, referred to as the edge stalk over ei,jEe_{i,j}\in E.

  • For each edge eijEe_{ij}\in E with bounding vertices xi,xjx_{i},x_{j}, a pair of bounded linear restriction maps

    (nt)xieij:=kijtPximγij:nt(xi)nt(eij),\displaystyle(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{i})\to\mathcal{F}_{n}^{t}(e_{ij}),
    (nt)xjeij:=kijtPxjmγij:nt(xj)nt(eij),\displaystyle(\mathcal{F}_{n}^{t})_{x_{j}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{j})\to\mathcal{F}_{n}^{t}(e_{ij}), (3)

    where kijt=ed(xi,xj)2/4tk_{ij}^{t}=e^{-d_{\mathcal{M}}(x_{i},x_{j})^{2}/4t}, with dd_{\mathcal{M}} the geodesic distance on \mathcal{M}, and PximγijP_{x_{i}\to m_{\gamma_{ij}}} denotes the unitary parallel transport map on \mathcal{E} between xix_{i} and mγijm_{\gamma_{ij}}.

For the sake of exposition, the choice of sample and corresponding geodesic paths will often be suppressed, so our parallel transports will be denoted as PximijP_{x_{i}\to m_{ij}}. Also, note that for n<mn<m and 𝒳n𝒳m\mathcal{X}_{n}\subset\mathcal{X}_{m}, we assume each additional point is again sampled iid from the uniform distribution on \mathcal{M}. For the categorically-minded reader, we remark that our sheaf is constructed such that refining our sample then leads to a subfunctor ntmt\mathcal{F}^{t}_{n}\subset\mathcal{F}^{t}_{m}. For the Hilbert sheaf nt\mathcal{F}_{n}^{t} on the graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E), a signal is an element of the Hilbert space

C0(nt;Gn):=xi𝒳nnt(xi).C^{0}(\mathcal{F}_{n}^{t};G_{n}):=\bigoplus_{x_{i}\in\mathcal{X}_{n}}\mathcal{F}_{n}^{t}(x_{i}). (4)
Example 4.

If nt\mathcal{F}_{n}^{t} encodes univariate spatiotemporal data, each node stalk can be chosen as nt(xi)=L2()\mathcal{F}_{n}^{t}(x_{i})=L^{2}(\mathbb{R}). Then C0(nt;Gn)=xi𝒳nL2(),C^{0}(\mathcal{F}_{n}^{t};G_{n})=\bigoplus_{x_{i}\in\mathcal{X}_{n}}L^{2}(\mathbb{R}), so a signal assigns a full time series to every node, recovering the usual notion of a node-time graph signal.

Example 5.

In one dimension, a probability distribution μ𝒫2()\mu\in\mathcal{P}_{2}(\mathbb{R}) can be represented by its quantile function QμL2([0,1])Q_{\mu}\in L^{2}([0,1]), and the Wasserstein distance becomes the L2L^{2} distance between quantiles [99]. Thus, by choosing node stalks nt(xi)=L2([0,1]),\mathcal{F}_{n}^{t}(x_{i})=L^{2}([0,1]), a signal assigns a full probability distribution to each graph node, recovering the distributional graph-signal setting of [57, 112].

Finally, analogous to the construction of the connection Laplacian Δ\Delta_{\nabla}, we may construct the Hilbert sheaf Laplacian. Further details, such as self-adjointness, are discussed in Appendix G.5.

Definition 4 (Hilbert Sheaf Laplacian).

Let nt\mathcal{F}_{n}^{t} be the Hilbert sheaf on the graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E) induced by Def, 3. Fix an orientation for each edge eEe\in E. The Hilbert sheaf Laplacian is the bounded linear operator

Δnt:C0(nt;Gn)C0(nt;Gn)\Delta_{\mathcal{F}_{n}^{t}}:C^{0}(\mathcal{F}_{n}^{t};G_{n})\to C^{0}(\mathcal{F}_{n}^{t};G_{n}) (5)

defined, for a signal SC0(nt;Gn)S\in C^{0}(\mathcal{F}_{n}^{t};G_{n}) and at a node xi𝒳nx_{i}\in\mathcal{X}_{n}, by

(ΔntS)xi=eE:e={xi,xj}(nt)xie((nt)xieSxi(nt)xjeSxj),(\Delta_{\mathcal{F}_{n}^{t}}S)_{x_{i}}=\sum_{\begin{subarray}{c}e\in E:\\ e=\{x_{i},x_{j}\}\end{subarray}}(\mathcal{F}_{n}^{t})_{x_{i}\leq e}^{*}\left((\mathcal{F}_{n}^{t})_{x_{i}\leq e}S_{x_{i}}-(\mathcal{F}_{n}^{t})_{x_{j}\leq e}S_{x_{j}}\right), (6)

where xjx_{j} denotes the other endpoint of ee, and (nt)xie(\mathcal{F}_{n}^{t})_{x_{i}\leq e}^{*} is the adjoint of the restriction map (nt)xie(\mathcal{F}_{n}^{t})_{x_{i}\leq e}.

Intuitively, Δnt\Delta_{\mathcal{F}_{n}^{t}} measures how much a signal fails to be locally consistent across edges: before comparing SxiS_{x_{i}} and SxjS_{x_{j}}, both values are mapped into the common edge stalk nt(e)\mathcal{F}_{n}^{t}(e) by the restriction maps. Thus, it is a broad generalization of a graph Laplacian, with restriction maps replacing scalar edge weights. The Hilbert sheaf Laplacian Δnt\Delta_{\mathcal{F}_{n}^{t}} is a self-adjoint bounded linear operator. Once the manifold is sampled and the induced sheaf Laplacian is computed, space-discretized HilbNets, which are still not implementable due to the infinite-dimensional signals, are simply given by Def, 2 with the connection Laplacian of \mathcal{E} replaced by the sheaf Laplacian of nt\mathcal{F}_{n}^{t}, i.e., by Ω(nt,Δnt,𝒲,σ)\Omega(\mathcal{F}_{n}^{t},\Delta_{\mathcal{F}_{n}^{t}},\mathcal{W},\sigma).

Signal Sampling. Hilbert cellular sheaves are the structures that arise when we sample our base manifold but faithfully record the potentially infinite-dimensional signal in each fiber. In practice, we typically only have access to a sampled or compressed version of the signal as well. For instance, when considering a timeseries SL2()S\in L^{2}(\mathbb{R}), we may use the orthogonal Fourier basis {eikθ}¯k=L2()\overline{\{e^{ik\theta}\}}_{k\in\mathbb{Z}}=L^{2}(\mathbb{R}) and then record a compressed representation with respect to this basis i.e. [S,eidθ,,S,eidθ][\langle S,e^{-id\theta}\rangle,\dots,\langle S,e^{id\theta}\rangle] for some dd. We can consider fiber-wise orthogonal projections with respect to any chosen basis in the Hilbert bundle setting as a principled approach to discretizing Hilbert bundle signals.

Proposition 1.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle, with strictly infinite-dimensional generic Hilbert-space fiber \mathcal{H}. Fix an orthogonal basis ={e1,e2,}\mathcal{B}=\{e_{1},e_{2},...\} of \mathcal{H} and let d:=span(e1,e2,,ed)\mathcal{H}_{d}:=\mathrm{span}(e_{1},e_{2},...,e_{d}). Then there exists a smooth map of bundles

Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} (7)

where d\mathcal{E}_{d} is a dd-dimensional vector bundle with generic fiber d\mathcal{H}_{d} and at each xx\in\mathcal{M}, Πd|x:xd,x\left.\Pi_{d}\right|_{\mathcal{E}_{x}}:\mathcal{E}_{x}\to\mathcal{E}_{d,x} recovers the usual orthogonal projection map. See Appendix H.4 for details.

Applying Proposition 1 to (,,)(\mathcal{M},\mathcal{E},\nabla), we obtain (,d,d)(\mathcal{M},\mathcal{E}_{d},\nabla_{d}), to which we may apply the spatial discretization of Def.3 to construct the cellular sheaf n,dt\mathcal{F}_{n,d}^{t} with dd-dimensional stalks. We refer to n,dt\mathcal{F}_{n,d}^{t} as a network sheaf. The signals on this sheaf are then sampled Hilbert bundle signals, i.e., we can discretize SL2(,)S\in L^{2}(\mathcal{M},\mathcal{E}) as a ndnd-dimensional vector 𝐬n,d:=ΠdSC0(n,dt;Gn)nd\mathbf{s}_{n,d}:=\Pi_{d}S\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n})\subseteq\mathbb{R}^{nd}, stacking the dd-dimensional orthogonal projections over the nn sampled locations, with respect to the chosen basis \mathcal{B}. In this case, the restriction maps can be written as matrices, thus the sheaf Laplacian becomes a block matrix Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} whose (i,j)(i,j)-block maps the discretized stalk over xjx_{j} to the discretized stalk over xix_{i} and is given by

(Δn,dt)ij={r:eirEkirtId,i=j,Pxjxi(d,eij),ij and eijE,0,otherwise,(\Delta_{\mathcal{F}_{n,d}^{t}})_{ij}=\begin{cases}\displaystyle\sum_{r:\,e_{ir}\in E}k_{ir}^{t}I_{d},&i=j,\\[11.99998pt] \displaystyle-P_{x_{j}\to x_{i}}^{(d,e_{ij})},&i\neq j\text{ and }e_{ij}\in E,\\[5.0pt] 0,&\text{otherwise,}\end{cases} (8)

where Pxjxi(d,eij):=kijt(Pximij(d))Pxjmij(d)P_{x_{j}\to x_{i}}^{(d,e_{ij})}:=k_{ij}^{t}\left(P_{x_{i}\to m_{ij}}^{(d)}\right)^{*}P_{x_{j}\to m_{ij}}^{(d)}, and Pxjmij(d)P_{x_{j}\to m_{ij}}^{(d)} denotes the restriction of the parallel transport map from xj\mathcal{E}_{x_{j}} to mij\mathcal{E}_{m_{ij}} to the corresponding dd-dimensional subbundles in the image of Πd\Pi_{d}. We may thus use this Laplacian to build an implementable sheaf convolutional architecture as follows.

Definition 5 ((n,d)(n,d)-Hilbert bundle convolutional neural network).

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle, with generic Hilbert-space fiber \mathcal{H} and corresponding basis \mathcal{B}. A (n,d)(n,d)-Hilbert bundle convolutional neural network, or (n,d)(n,d)-HilbNet, is specified by a filter bank 𝒲={gu,q},u,q\mathcal{W}=\{g^{\ell}_{u,q}\}_{\ell,u,q} with gu,qLc()g^{\ell}_{u,q}\in L_{c}^{\infty}(\mathbb{R}), a Lipschitz continuous nonlinear activation σ:\sigma:\mathbb{R}\to\mathbb{R}, the choice of a dd-dimensional subbasis of \mathcal{B} and sample 𝒳n\mathcal{X}_{n}\subset\mathcal{M}. Given input sampled signals 𝐬n,d,1,,𝐬n,d,F0C0(n,dt;Gn)\mathbf{s}_{n,d,1},...,\mathbf{s}_{n,d,F_{0}}\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n}), the LL-layer network output is obtained by the recursion

𝐬n,d,u+1=σ(q=1Fgu,q(Δn,dt)𝐬n,d,q),=0,,L1,𝐬n,d,q0=𝐬n,d,q.\mathbf{s}_{n,d,u}^{\ell+1}=\sigma\left(\sum_{q=1}^{F_{\ell}}g^{\ell}_{u,q}(\Delta_{\mathcal{F}_{n,d}^{t}})\mathbf{s}_{n,d,q}^{\ell}\right),\qquad\ell=0,\dots,L-1,\qquad\mathbf{s}_{n,d,q}^{0}=\mathbf{s}_{n,d,q}. (9)

Discretized HilbNets are fully implementable and can be compactly written using Def. 2 with the connection Laplacian of \mathcal{E} replaced by the sheaf Laplacian of n,dt\mathcal{F}_{n,d}^{t}, i.e., by Ω(n,dt,Δn,dt,𝒲,σ)\Omega(\mathcal{F}_{n,d}^{t},\Delta_{\mathcal{F}_{n,d}^{t}},\mathcal{W},\sigma).

Example 6.

In the case that we consider our filter bank 𝒲\mathcal{W} to consist of order KK polynomials, the (n,d)(n,d)-HilbNet can be written as a novel variant of sheaf neural networks [50, 11] given by

𝐒n,d+1=σ(k=0K1(Δn,dt)k𝐒n,d𝐖,k)nd.\displaystyle\mathbf{S}_{n,d}^{\ell+1}=\sigma\left(\sum_{k=0}^{K-1}(\Delta_{\mathcal{F}_{n,d}^{t}})^{k}\mathbf{S}^{\ell}_{n,d}\mathbf{W}_{\ell,k}\right)\in\mathbb{R}^{nd}. (10)

where the matrices 𝐒n,dnd×F\mathbf{S}^{\ell}_{n,d}\in\mathbb{R}^{nd\times F_{\ell}}, and {𝐖,k}k\{\mathbf{W}_{\ell,k}\}_{k}, with 𝐖,kF×F+1\mathbf{W}_{\ell,k}\in\mathbb{R}^{F_{\ell}\times F_{\ell+1}} collect the sampled signals and the learnable filter weights at each layer, respectively.

5 Theoretical Convergence Guarantees

Our main result may be understood as a far-reaching generalization of the convergence result of Belkin and Niyogi [14]. Consider a random sample 𝒳n\mathcal{X}_{n}\subset\mathcal{M} and the corresponding geometric graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E). It is established in [14] that as sampling density increases, the weighted graph Laplacian converges to the manifold Laplace-Beltrami operator in probability. We analogously show that the Hilbert sheaf Laplacian Δn\Delta_{\mathcal{F}_{n}} over GnG_{n} converges to Δ\Delta_{\nabla}, thus recovering the results of [14] as the special case =×\mathcal{E}=\mathcal{M}\times\mathbb{R}. Our proof, presented in Appendix H, is inspired by the strategy of [14] but with the necessary non-trivial modifications to accommodate the simultaneous generalization to cellular sheaves instead of graphs and to infinite-dimensional Hilbert-spaces. In order to state our results, we require the following intermediary operator.

Definition 6.

(Point-Cloud Extension of Sheaf Laplacian) Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle and consider a sample 𝒳n\mathcal{X}_{n}\subset\mathcal{M}. Then the corresponding Hilbert sheaf Laplacian Δnt\Delta_{\mathcal{F}^{t}_{n}} may be extended to the point-cloud Laplacian Δ^nt\hat{\Delta}_{\mathcal{F}^{t}_{n}}, an operator on L2(,)L^{2}(\mathcal{M},\mathcal{E}) via

(Δ^ntS)(x)=1njed(x,xj)2/4t(S(x)PxjxS(xj))(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x)=\frac{1}{n}\sum_{j}e^{-d_{\mathcal{M}}(x,x_{j})^{2}/4t}\big(S(x)-P_{x_{j}\to x}S(x_{j})\big) (11)

As such, we are able to consider the sheaf-level and bundle-level Laplacians as operators on the same space through this extension. In this setting, we then have the following convergence result.

Theorem 1.

(Convergence of Hilbert Sheaf Laplacian) Let \mathcal{M} be a mm-dimensional closed Riemannian manifold. Further, let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle and associated connection Laplacian Δ\Delta_{\nabla}. Fix a section SC3(,)S\in C^{3}(\mathcal{M},\mathcal{E}). Consider a random sample 𝒳n={x1,x2,,xn}\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}. Let nt\mathcal{F}^{t}_{n} be the induced Hilbert cellular sheaf with bandwidth tt. Then we have, for any xx\in\mathcal{M},

limn1tn(4πtn)m2Δ^ntnS(x)=1vol()ΔS(x)in probability,\lim_{n\rightarrow\infty}\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}{\hat{\Delta}}_{\mathcal{F}^{t_{n}}_{n}}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\quad\text{in probability,} (A)

with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0. Further, if SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}), we have

limn𝔼𝒳[1tn(4πtn)m2Δ^ntnS(x)1vol()ΔS(x)L22]=0in L2-norm.\lim_{n\rightarrow\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$-norm}. (B)

Our framework may be seen as concurrently generalizing the convergence results for the weighted graph Laplacian of [14] as well as the graph connection Laplacian of [93] to allow for arbitrary bundles, with potentially infinite-dimensional fibers, equipped with an arbitrary choice of connection. Such convergence results have previously served as the basis of transferability and robustness results in geometric deep learning [101, 104, 67], as well as to justify the development of numerous Laplacian-based manifold learning techniques [13, 28, 108, 97, 85]. We may likewise develop generalizations of these results for implementable sheaf Laplacians by discretizing in the signal-domain.

Finite Rank Convergence. Consider a signal SS that has been sampled to 𝐬n,d\mathbf{s}_{n,d} as per Proposition 1. We then have the following theoretical guarantee, which formalizes the intuitive notion that the fully discretized sheaf Laplacian converges to true connection Laplacian as we take an increasingly refined sample of both the underlying manifold and the signal.

Theorem 2 (Finite-Rank Approximation).

Consider the setting of Theorem 1 with a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}), for \mathcal{E} a strictly infinite-dimensional Hilbert bundle. Then there exists a sequence of finite rank approximating sheaves n,dntn\mathcal{F}^{t_{n}}_{n,d_{n}} such that

limn𝔼𝒳[1tn(4πtn)m/2Δ^n,dntn𝐬n,dn1vol()ΔSL22]=0in L2-norm,\lim_{n\to\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}\mathbf{s}_{n,d_{n}}-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$-norm}, (12)

with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0.

While this result, as described in Appendix B, can pave the way for new developments of Laplacian-based manifold learning, we here restrict our focus to its consequences for (n,d)(n,d)-HilbNets.

Corollary 2.1 (Convergence in Architecture).

Under the hypotheses of Theorem 2, let {dn}n\{d_{n}\}_{n} be the required sequence. Fix a fiber-wise nonlinearity σ\sigma that is CσC_{\sigma}-Lipschitz in the corresponding fiber norms and choice of filter bank 𝒲\mathcal{W}. Then, the output of the discrete (n,d)(n,d)-HilbNet converges to the output of the continuous HilbNet architecture in the sense that

Ω(n,dntn,Δ^n,dntn,𝒲,σ)Ω(,Δ,𝒲,σ)in mean squared error,\Omega(\mathcal{F}^{t_{n}}_{n,d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}},\mathcal{W},\sigma)\to\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma)\quad\text{in mean squared error}, (13)

as the sampling density n,dnn,d_{n}\to\infty.

Corollary 2.2.

(Transferability) Let {𝒳n}n=1\{\mathcal{X}_{n}\}_{n=1}^{\infty} and {𝒴n}n=1\{\mathcal{Y}_{n}\}_{n=1}^{\infty} be independent sequences of random samples of \mathcal{M}. Let {dn}n=1\{d_{n}\}^{\infty}_{n=1} be a sequence such that the conclusion of Theorem 2 holds for both samplings. For any fiber-wise nonlinearity σ\sigma that is CσC_{\sigma}-Lipschitz in the corresponding fiber norms and any filter bank 𝒦\mathcal{K}, we then have that,

limn𝔼𝒳,𝒴[Ω(𝒳n,dntn,Δ^𝒳n,dntn,𝒲,σ)Ω(𝒴n,dntn,Δ^𝒴n,dntn,𝒲,σ)L22]=0in L2 norm.\lim_{n\to\infty}\mathbb{E}_{\mathcal{X},\mathcal{Y{}}}\left[\left\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\right\|_{L^{2}}^{2}\right]=0\quad\text{in $L^{2}$ norm.} (14)

Further, one may derive a sample-independent quantitative bound for the L2L^{2} disagreement Ω(𝒳n,dntn,Δ^𝒳n,dntn,𝒲,σ)Ω(𝒴n,dntn,Δ^𝒴n,dntn,𝒲,σ)L2\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\|_{L^{2}}. See Appendix H for details.

By these results, we may understand the (n,d)(n,d)-HilbNets as the principled discretization of continuous HilbNets. These may also be understood as robustness results for (n,d)(n,d)-HilbNets, as they establish that the architecture is scale-consistent and is stable against resampling of the base manifold. Notably, by the generality of HilbNets, most existing geometric convolutional architectures can be understood as instances of HilbNets for a particular choice of bundle and connection. As such, our results may also be seen as extending the transferability guarantees of [101, 104, 67] to a larger class of architectures and data modalities. See Appendix C for further discussion.

Free O(m)O(m) Circulant Frozen identity (GCN)
nn Empirical Theory Empirical Theory Empirical Theory
1616 (1.42±0.21)×107(1.42{\pm}0.21){\times}10^{-7} 0 (1.79±0.28)×102(1.79{\pm}0.28){\times}10^{-2} 1.84×1021.84{\times}10^{-2} (2.31±0.27)×102(2.31{\pm}0.27){\times}10^{-2} 2.30×1022.30{\times}10^{-2}
3232 (1.23±0.23)×107(1.23{\pm}0.23){\times}10^{-7} 0 (1.16±0.14)×102(1.16{\pm}0.14){\times}10^{-2} 1.18×1021.18{\times}10^{-2} (1.46±0.18)×102(1.46{\pm}0.18){\times}10^{-2} 1.46×1021.46{\times}10^{-2}
6464 (1.82±0.77)×107(1.82{\pm}0.77){\times}10^{-7} 0 (1.02±0.14)×102(1.02{\pm}0.14){\times}10^{-2} 1.03×1021.03{\times}10^{-2} (1.29±0.17)×102(1.29{\pm}0.17){\times}10^{-2} 1.29×1021.29{\times}10^{-2}
128128 (1.63±0.19)×107(1.63{\pm}0.19){\times}10^{-7} 0 (8.85±0.71)×103(8.85{\pm}0.71){\times}10^{-3} 8.93×1038.93{\times}10^{-3} (1.11±0.09)×102(1.11{\pm}0.09){\times}10^{-2} 1.11×1021.11{\times}10^{-2}
256256 (1.59±0.28)×107(1.59{\pm}0.28){\times}10^{-7} 0 (7.74±0.13)×103(7.74{\pm}0.13){\times}10^{-3} 7.79×1037.79{\times}10^{-3} (9.67±0.14)×103(9.67{\pm}0.14){\times}10^{-3} 9.67×1039.67{\times}10^{-3}
Table 1: Synthetic transport recovery. Empirical: best edge-MSE achieved by each variant. Theory: analytical squared Frobenius projection distance of PximijLCP^{LC}_{x_{i}\to m_{ij}} onto the variant’s hypothesis class.

6 Experimental Results

A key practical advantage of the (n,d)(n,d)-HilbNets architecture in comparison to existing approaches for processing graph signals is our use of parallel transport, which in practice can be known or learned. For instance, these transport operators allow us to incorporate principled signal-level geometric priors in concert with the spatial priors of existing spatiotemporal GCNs. This is well-aligned with the thesis of geometric deep learning that the principled incorporation of geometric priors improves performance, particularly in the low-data or small-model regimes. For further discussion on strategies for either hand-crafting or learning parallel transport operators, see Appendix D. Here, we first validate our setup for a synthetic dataset realized from discretizing a known Hilbert bundle in information geometry, and then consider performance on real-world spatiotemporal graph benchmarks based upon traffic forecasting. In all the experiments, we use the polynomial (n,d)(n,d)-HilbNet from (10) and learned transport maps.

Model Params Horizon 3 Horizon 6 Horizon 12
MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE
FC-LSTM [69] 150K{\sim}150\text{K} 3.443.44 6.306.30 9.69.6 3.773.77 7.237.23 10.910.9 4.374.37 8.698.69 13.213.2
STAEformer [71] 4.7M{\sim}4.7\text{M} 2.652.65 5.115.11 6.856.85 2.972.97 6.006.00 8.138.13 3.343.34 7.027.02 9.709.70
MLP fiber baseline 5,2125{,}212 3.131±0.0043.131{\pm}0.004 6.074±0.0076.074{\pm}0.007 8.271±0.0218.271{\pm}0.021 3.775±0.0053.775{\pm}0.005 7.496±0.0137.496{\pm}0.013 10.626±0.04610.626{\pm}0.046 4.690±0.0114.690{\pm}0.011 9.184±0.0149.184{\pm}0.014 14.341±0.09714.341{\pm}0.097
Spatiotemporal graph baseline 8,9088{,}908 3.453±0.0803.453{\pm}0.080 6.709±0.2206.709{\pm}0.220 9.241±0.2799.241{\pm}0.279 4.160±0.1174.160{\pm}0.117 8.158±0.1848.158{\pm}0.184 11.916±0.49111.916{\pm}0.491 5.277±0.0935.277{\pm}0.093 10.102±0.12810.102{\pm}0.128 16.006±0.66216.006{\pm}0.662
HilbNet, frozen identity (GCN) 5,7565{,}756 3.092±0.0073.092{\pm}0.007 5.920±0.0135.920{\pm}0.013 8.218±0.0658.218{\pm}0.065 3.713±0.0103.713{\pm}0.010 7.312±0.0207.312{\pm}0.020 10.520±0.06110.520{\pm}0.061 4.608±0.0344.608{\pm}0.034 8.991±0.0438.991{\pm}0.043 14.166±0.24014.166{\pm}0.240
HilbNet, circulant 11,65611{,}656 2.939±0.0212.939{\pm}0.021 5.630±0.0615.630{\pm}0.061 7.908±0.0677.908{\pm}0.067 3.409±0.0323.409{\pm}0.032 6.765±0.0926.765{\pm}0.092 9.844±0.1259.844{\pm}0.125 4.059±0.0494.059{\pm}0.049 8.149±0.1148.149{\pm}0.114 12.471±0.19512.471{\pm}0.195
HilbNet, free O(T)O(T) 119,036119{,}036 2.923±0.013\mathbf{2.923{\pm}0.013} 5.586±0.048\mathbf{5.586{\pm}0.048} 7.808±0.083\mathbf{7.808{\pm}0.083} 3.372±0.023\mathbf{3.372{\pm}0.023} 6.732±0.066\mathbf{6.732{\pm}0.066} 9.507±0.096\mathbf{9.507{\pm}0.096} 3.938±0.030\mathbf{3.938{\pm}0.030} 8.042±0.101\mathbf{8.042{\pm}0.101} 11.642±0.136\mathbf{11.642{\pm}0.136}
Model Params Horizon 3 Horizon 6 Horizon 12
MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE
FC-LSTM [69] 150K{\sim}150\text{K} 2.052.05 4.194.19 4.84.8 2.202.20 4.554.55 5.25.2 2.372.37 4.964.96 5.75.7
STAEformer [71] 4.7M{\sim}4.7\text{M} 1.311.31 2.782.78 2.762.76 1.621.62 3.683.68 3.623.62 1.881.88 4.344.34 4.414.41
MLP fiber baseline 5,2125{,}212 1.459±0.0031.459{\pm}0.003 3.145±0.0173.145{\pm}0.017 3.026±0.0203.026{\pm}0.020 1.942±0.0041.942{\pm}0.004 4.378±0.0174.378{\pm}0.017 4.385±0.0454.385{\pm}0.045 2.513±0.0042.513{\pm}0.004 5.658±0.0155.658{\pm}0.015 6.209±0.0386.209{\pm}0.038
Spatiotemporal graph baseline 8,9088{,}908 1.400±0.002\mathbf{1.400{\pm}0.002} 2.980±0.0162.980{\pm}0.016 2.924±0.003\mathbf{2.924{\pm}0.003} 1.850±0.0021.850{\pm}0.002 4.162±0.0094.162{\pm}0.009 4.222±0.0094.222{\pm}0.009 2.388±0.0042.388{\pm}0.004 5.395±0.0235.395{\pm}0.023 5.924±0.0325.924{\pm}0.032
HilbNet, frozen identity (GCN) 5,7565{,}756 1.439±0.0031.439{\pm}0.003 3.077±0.0223.077{\pm}0.022 2.985±0.0072.985{\pm}0.007 1.901±0.0061.901{\pm}0.006 4.264±0.0214.264{\pm}0.021 4.319±0.0154.319{\pm}0.015 2.446±0.0072.446{\pm}0.007 5.495±0.0195.495{\pm}0.019 6.020±0.0326.020{\pm}0.032
HilbNet, circulant 15,36615{,}366 1.413±0.0021.413{\pm}0.002 2.971±0.0182.971{\pm}0.018 2.982±0.0152.982{\pm}0.015 1.806±0.0031.806{\pm}0.003 3.950±0.015\mathbf{3.950{\pm}0.015} 4.162±0.0374.162{\pm}0.037 2.211±0.0142.211{\pm}0.014 4.866±0.048\mathbf{4.866{\pm}0.048} 5.386±0.0475.386{\pm}0.047
HilbNet, free O(T)O(T) 190,268190{,}268 1.417±0.0021.417{\pm}0.002 2.969±0.015\mathbf{2.969{\pm}0.015} 3.058±0.0133.058{\pm}0.013 1.793±0.005\mathbf{1.793{\pm}0.005} 3.958±0.0263.958{\pm}0.026 4.127±0.037\mathbf{4.127{\pm}0.037} 2.181±0.003\mathbf{2.181{\pm}0.003} 4.873±0.0194.873{\pm}0.019 5.214±0.045\mathbf{5.214{\pm}0.045}
Table 2: METR-LA (top) and PEMS (bottom) traffic forecasting results. Bottom block of each table: our experiments (mean ±\pm standard deviation over five seeds). Top block of each table: external baselines reported as in the cited papers. MAPE is in percent. Lower is better for all metrics.

Synthetic Experiments. We first consider a task where, for a known Hilbert bundle and connection, we train a discretized HilbNet to predict the true parallel transport operators. Following [72], the base manifold is =Sym++(p)\mathcal{M}=\mathrm{Sym}^{++}(p) equipped with the Otto-Wasserstein metric, each Σ\Sigma\in\mathcal{M} parameterizing a density 𝒩(0,Σ)\mathcal{N}(0,\Sigma). The ambient fiber Σ=L2(ρΣ;p)\mathcal{H}_{\Sigma}=L^{2}(\rho_{\Sigma};\mathbb{R}^{p}) is genuinely infinite-dimensional; the computational fiber is the Otto-velocity image of covariance perturbations, a sub-bundle whose fibers are already finite-dimensional with d=p(p+1)/2d=p(p+1)/2, and on which the Levi-Civita transports PximijLCP^{LC}_{x_{i}\to m_{ij}} admit a closed-form. We sample nn points, build a kkNN graph GnG_{n} (k=8k{=}8) under W2W_{2}, and assemble the network sheaf n,dt\mathcal{F}_{n,d}^{t} and its Laplacian Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} from Def, 3. We consider three transport parametrizations from Appendix D ( averaged over 3 seeds): free O(d)O(d) (Householder), circulant, and frozen identity (a usual GCN [59]), to recover the Levi-Civita transports in Cholesky-rescaled coordinates. As the reader can notice in Table 4, the free class recovers PximijLCP^{LC}_{x_{i}\to m_{ij}} to numerical precision 1.6107{\approx}1.6\cdot 10^{-7}, while each restricted class converges to its analytical Frobenius-projection plateau to within 1%1\%. This confirms that the transport hypothesis class constrains the per-edge restriction maps of Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} in a quantitatively predictable way. More experiments and details are in Appendix F.1

Traffic Forecasting. We evaluate HilbNets on real-world spatiotemporal traffic-speed forecasting, where each node of a road-network graph carries a time-series fiber with d=Td=T observed time steps. This is a natural instance of the Hilbert-bundle framework: the base graph encodes spatial proximity among sensors, while the edgewise transports PjieijP_{j\to i}^{e_{ij}} from Appendix D control how temporal fibers are aligned before filtering. We test on two standard benchmarks, METR-LA [69] and PEMS-BAY [69], predicting future speeds at horizons 33, 66, and 1212. We compare the same three HilbNet transport classes, frozen identity (a GCN), circulant, and free O(T)O(T), against a fiber-only MLP and a spatiotemporal graph baseline obtained by stacking GCN layers with one-dimensional convolutional layers processing the temporal dimension, all sharing the same polynomial sheaf filter order, readout, and forecasting loss. Full experimental details are given in Appendix F.2. Table 2 reports MAE, RMSE, and MAPE (mean ±\pm std over five seeds). On both datasets, learning non-trivial transports consistently improves over frozen identity at all horizons, confirming that the sheaf structure helps beyond the usual graph structure. The free O(T)O(T) class achieves the best overall accuracy, but the circulant variant is competitive while using roughly one tenth of the transport parameters, supporting the use of structured, physically motivated transport priors for spatiotemporal data. This confirms that geometric inductive biases can lead to better performance in low-data regimes or comparable performance in normal regimes with substantially fewer parameters.

7 Conclusions

We introduced a novel convolutional learning framework for infinite-dimensional signals over a manifold using Hilbert bundles, a setting that concurrently unifies and generalizes existing approaches. It allows us to consider arbitrary connection Laplacians, a more general class of filters via the Borel calculus, and thus applications of the resulting filters to potentially infinite-dimensional signals. We defined HilbNets as stacks of Hilbert bundle filters and pointwise non-linearity. We consequently introduced a practically implementable (n,d)-HilbNet via the theory of Hilbert cellular sheaves, and proved that this discretized architecture converges to the continuous architecture in the limit. Notably, our convergence in architecture is derived from a novel extension of the Laplacian convergence result of [14] to the setting of Hilbert sheaves and Hilbert bundles, and we believe this result will be of independent interest to the broader machine learning community. Lastly, we verified the benefits of integrating domain-specific geometric priors through experiments with discretized HilbNets on synthetic and real-world data. Overall, we envision the prospective impact of our contributions as two-fold: the HilbNet framework allows for the principled development of domain-specific architectures through appropriate choices of connection, filter bank, and manifold- and signal-alignment measures, while our Hilbert Laplacian convergence theorem lays the theoretical groundwork for the development of Laplacian-based manifold-theoretic techniques in the setting of infinite-dimensional signals, spanning from mechanistic interpretability to self-supervised learning methods. A more detailed discussion on broader impact and limitations is presented in Appendix B.

References

  • [1] O. Abdel-Hamid et al. (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In 2012 International Conference on Acoustics, Speech and Signal Processing (ICASSP), External Links: Document Cited by: §1.
  • [2] M. Aggarwal and M. N. Murty (2020) Machine learning in social networks: embedding nodes, edges, communities, and graphs. Springer Nature. Cited by: §1.
  • [3] A. Ambrosetti and G. Prodi (1995) A primer of nonlinear analysis. Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, UK. External Links: ISBN 9780521454057 Cited by: §H.1.
  • [4] S. Axelrod, S. della Pietra, and E. Witten (1991-05) Geometric quantization of chern–simons gauge theory. Journal of Differential Geometry 33 (3), pp. 787–902. External Links: Document Cited by: Example 1.
  • [5] J. Bamberger, F. Barbero, X. Dong, and M. M. Bronstein (2025) Bundle neural network for message diffusion on graphs. In The Thirteenth International Conference on Learning Representations, External Links: Link Cited by: Appendix A, §C.1.1.
  • [6] S. Barbarossa and S. Sardellitti (2020) Topological signal processing over simplicial complexes. IEEE Trans. on Signal Processing 68, pp. 2992–3007. Cited by: §1.
  • [7] F. Barbero et al. (2022) Sheaf neural networks with connection laplacians. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1.
  • [8] T. Batard (2011) Heat equations on vector bundles—application to color image regularization. Journal of Mathematical Imaging and Vision 41 (1-2), pp. 59–85. External Links: Document Cited by: Example 3.
  • [9] C. Battiloro et al. (2023) Tangent bundle filters and neural networks: from manifolds to cellular sheaves and back. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. Cited by: Appendix A, §C.1.1, §1.
  • [10] C. Battiloro, L. Testa, L. Giusti, S. Sardellitti, P. Di Lorenzo, and S. Barbarossa (2024) Generalized simplicial attention neural networks. IEEE Transactions on Signal and Information Processing over Networks 10, pp. 833–850. Cited by: §1.
  • [11] C. Battiloro, Z. Wang, H. Riess, P. Di Lorenzo, and A. Ribeiro (2024) Tangent bundle convolutional learning: from manifolds to cellular sheaves and back. IEEE Transactions on Signal Processing. Cited by: Appendix A, Appendix A, Appendix D, §1, §1, §1, §1, §1, Example 6.
  • [12] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision 61 (2), pp. 139–157. External Links: Document Cited by: Appendix D.
  • [13] M. Belkin and P. Niyogi (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems 14. Cited by: Appendix B, Appendix B, §5.
  • [14] M. Belkin and P. Niyogi (2008) Towards a theoretical foundation for laplacian-based manifold methods. Journal of Computer and System Sciences 74 (8), pp. 1289–1308. Note: Learning Theory 2005 External Links: ISSN 0022-0000, Document, Link Cited by: Appendix A, Appendix A, Appendix B, Appendix B, Appendix B, §C.1.1, 2nd item, 3rd item, §G.6, §H.2, §1, §1, §1, §5, §5, §7.
  • [15] N. Berline, E. Getzler, and M. Vergne (1992) Heat kernels and dirac operators. Springer Berlin, Heidelberg. External Links: Document Cited by: §G.3, §G.3, Remark 7.
  • [16] C. Bodnar et al. (2021) Weisfeiler and lehman go cellular: cw networks. In Advances in Neural Information Processing Systems, Vol. 34, pp. 2625–2640. Cited by: §1.
  • [17] C. Bodnar et al. (2021) Weisfeiler and Lehman go topological: message passing simplicial networks. In ICLR 2021 Workshop on Geometrical and Topological Representation Learning, Cited by: §1.
  • [18] C. Bodnar et al. (2022) Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in gnns. arXiv. External Links: Document Cited by: Appendix A, §C.1.1, Appendix D, §1.
  • [19] B. Bonev, T. Kurth, C. Hundt, J. Pathak, M. Baust, K. Kashinath, and A. Anandkumar (2023) Spherical fourier neural operators: learning stable dynamics on the sphere. In International conference on machine learning, pp. 2806–2823. Cited by: Appendix A.
  • [20] V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth (2020) Matérn gaussian processes on Riemannian manifolds. In Advances in Neural Information Processing Systems, Vol. 33. External Links: Link, 2006.10160 Cited by: Appendix A, §1.
  • [21] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković (2021) Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478. Cited by: §2.
  • [22] M. M. Bronstein et al. (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: Appendix A.
  • [23] J. Brüning and M. Lesch (1992) Hilbert complexes. Journal of Functional Analysis 108, pp. 88–132. Cited by: item 2.
  • [24] A. Caponera and D. Marinucci (2021) Asymptotics for spherical functional autoregressions. The Annals of Statistics 49 (1), pp. 346–369. External Links: Document Cited by: Appendix A.
  • [25] É. Cartan (1967) Differential calculus. Princeton University Press, Princeton, NJ. Cited by: §H.2.
  • [26] G. Chen, X. Liu, Q. Meng, L. Chen, C. Liu, and Y. Li (2024) Learning neural operators on riemannian manifolds. National Science Open 3 (6), pp. 20240001. Cited by: Appendix A.
  • [27] T. S. Cohen, M. Geiger, and M. Weiler (2019) A general theory of equivariant cnns on homogeneous spaces. Advances in neural information processing systems 32. Cited by: §1.
  • [28] R. R. Coifman and S. Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis 21 (1), pp. 5–30. External Links: Document, Link Cited by: Appendix B, Appendix B, §5.
  • [29] J. M. Curry (2014) Sheaves, cosheaves and applications. University of Pennsylvania. Cited by: Appendix A, §G.5, §1.
  • [30] G. D’Acunto and C. Battiloro (2025) The relativity of causal knowledge. In The 41st Conference on Uncertainty in Artificial Intelligence, External Links: Link Cited by: Appendix A.
  • [31] X. Dai and H. Müller (2018) Principal component analysis for functional data on Riemannian manifolds and spheres. The Annals of Statistics 46 (6B), pp. 3309–3338. External Links: Document, Link Cited by: Appendix A.
  • [32] R. Dangovski, L. Jing, C. Loh, S. Han, A. Srivastava, B. Cheung, P. Agrawal, and M. Soljačić (2021) Equivariant contrastive learning. arXiv preprint arXiv:2111.00899. Cited by: Appendix B.
  • [33] V. De Bortoli, E. Mathieu, M. Hutchinson, J. Thornton, Y. W. Teh, and A. Doucet (2022) Riemannian score-based generative modelling. Advances in neural information processing systems 35, pp. 2406–2422. Cited by: Appendix A.
  • [34] P. De Haan et al. (2020) Gauge equivariant mesh cnns: anisotropic convolutions on geometric graphs. arXiv preprint arXiv:2003.05425. Cited by: §1.
  • [35] L. Di Nino, G. D’Acunto, S. Barbarossa, and P. Di Lorenzo (2025) Learning the structure of connection graphs. arXiv preprint arXiv:2510.11245. Cited by: Appendix A.
  • [36] M. P. do Carmo (1992) Riemannian geometry. Mathematics: Theory & Applications, Birkhäuser, Boston. External Links: ISBN 978-0817634902 Cited by: 1st item, §H.2.
  • [37] I. Duta, G. Cassarà, F. Silvestri, and P. Liò (2023) Sheaf hypergraph networks. Advances in Neural Information Processing Systems 36, pp. 12087–12099. Cited by: Appendix A, §1.
  • [38] C. Fefferman, S. Mitter, and H. Narayanan (2016) Testing the manifold hypothesis. Journal of the American Mathematical Society 29 (4), pp. 983–1049. Cited by: §1.
  • [39] S. Fiorini, H. Aktas, I. Duta, S. Coniglio, P. Morerio, A. Del Bue, and P. Liò (2025) Sheaves reloaded: a directional awakening. arXiv preprint arXiv:2506.02842. Cited by: Appendix A, §1.
  • [40] F. Gama et al. (2018) Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing 67 (4), pp. 1034–1049. Cited by: §1.
  • [41] F. Gama et al. (2020-11) Graphs, convolutions, and neural networks: from graph filters to graph neural networks. IEEE Signal Processing Magazine 37, pp. 128–138. External Links: Document Cited by: §1, §3.
  • [42] R. Ghrist and H. Riess (2022) Cellular sheaves of lattices and the tarski laplacian. Homology, Homotopy and Applications 24 (1), pp. 325–345. Cited by: Appendix A.
  • [43] L. Giusti, C. Battiloro, et al. (2023) Cell attention networks. In 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.
  • [44] J. J. Gould (2025) Cellular sheaves of hilbert spaces. Ph.D. Thesis, University of Pennsylvania. Cited by: §G.5, §G.5, §4.
  • [45] F. Grassi, A. Loukas, N. Perraudin, and B. Ricaud (2017) A time-vertex signal processing framework: scalable processing and meaningful representations for time-series on graphs. IEEE Transactions on Signal Processing 66 (3), pp. 817–829. Cited by: §C.1.3.
  • [46] A. Grigor’yan (2009) Heat kernel and analysis on manifolds. AMS/IP Studies in Advanced Mathematics, Vol. 47, American Mathematical Society, Providence, RI. Cited by: §C.1.3.
  • [47] E. Grimaldi, M. E. Pandolfo, G. D’Acunto, S. Barbarossa, and P. Di Lorenzo (2025) Learning network sheaves for ai-native semantic communication. arXiv preprint arXiv:2512.03248. Cited by: Appendix A.
  • [48] A. Grothendieck (1955) A general theory of fibre spaces with structure sheaf. University of Kansas, Department of Mathematics. Cited by: Appendix A.
  • [49] T. Hanks, H. Riess, S. Cohen, T. Gross, M. Hale, and J. Fairbanks (2025) Distributed multi-agent coordination over cellular sheaves. arXiv preprint arXiv:2504.02049. Cited by: Appendix A.
  • [50] J. Hansen and T. Gebhart (2020) Sheaf neural networks. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1, Example 6.
  • [51] J. Hansen and R. Ghrist (2019) Learning sheaf laplacians from smooth signals. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 5446–5450. External Links: Document Cited by: Appendix A.
  • [52] J. Hansen and R. Ghrist (2019-12-01) Toward a spectral theory of cellular sheaves. Journal of Applied and Computational Topology 3 (4), pp. 315–358. External Links: ISSN 2367-1734, Document, Link Cited by: Appendix A, §G.5.
  • [53] J. Hansen and R. Ghrist (2021) Opinion dynamics on discourse sheaves. SIAM Journal on Applied Mathematics 81 (5), pp. 2033–2060. External Links: Document, https://doi.org/10.1137/20M1341088 Cited by: Appendix A, §G.5.
  • [54] E. J. Hu, M. Jain, E. Elmoznino, Y. Kaddar, G. Lajoie, Y. Bengio, and N. Malkin (2024) Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1.
  • [55] H. Huang, Y. LeCun, and R. Balestriero (2026) Semantic tube prediction: beating llm data efficiency with jepa. External Links: 2602.22617, Link Cited by: Appendix B.
  • [56] M. Hutchinson, A. Terenin, V. Borovitskiy, S. Takao, Y. Teh, and M. Deisenroth (2021) Vector-valued gaussian processes on riemannian manifolds via gauge independent projected kernels. Advances in Neural Information Processing Systems 34, pp. 17160–17169. Cited by: Appendix A, §1.
  • [57] F. Ji, Y. Zhao, S. H. Lee, K. Zhao, W. P. Tay, W. P. Tay, and J. Yang (2025-07) Graph distributional signals for regularization in graph neural networks. IEEE Transactions on Signal and Information Processing over Networks 11, pp. 670–682. External Links: Document Cited by: Example 5.
  • [58] A. Jiao, Q. Yan, J. Harlim, and L. Lu (2024) Solving forward and inverse pde problems on unknown manifolds via physics-informed neural operators. arXiv preprint arXiv:2407.05477. Cited by: Appendix A.
  • [59] T. N. Kipf and M. Welling (2017) Semi-Supervised Classification with Graph Convolutional Networks. In Proc. of the 5th International Conference on Learning Representations (ICLR), External Links: Link Cited by: §C.1.1, §1, §6.
  • [60] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 25, pp. . Cited by: §1.
  • [61] N. H. Kuiper (1965) The homotopy type of the unitary group of hilbert space. Topology 3 (1), pp. 19–30. External Links: Document Cited by: §H.4.
  • [62] S. Lang (1995) Differential and riemannian manifolds. 3 edition, Graduate Texts in Mathematics, Vol. 160, Springer, New York, NY. External Links: ISBN 978-0-387-94338-1, Document, ISSN 0072-5285 Cited by: §G.1.4.
  • [63] Y. LeCun et al. (1998) Gradient-based learning applied to document recognition. Proc. of the IEEE 86 (11), pp. 2278–2324. Cited by: §1.
  • [64] M. Ledoux and M. Talagrand (1991) Probability in banach spaces: isoperimetry and processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Vol. 23, Springer-Verlag, Berlin. Cited by: §H.1.
  • [65] J. Leray (1946) L’anneau d’homologie d’une représentation. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 222, pp. 1366–1368 (French). Cited by: Appendix A, §4.
  • [66] G. Leus, A. G. Marques, J. M. Moura, A. Ortega, and D. I. Shuman (2023) Graph signal processing: history, development, impact, and outlook. IEEE Signal Processing Magazine 40 (4), pp. 49–60. Cited by: §2.
  • [67] R. Levie, W. Huang, et al. (2021) Transferability of spectral graph convolutional neural networks. Journal of Machine Learning Research 22 (272), pp. 1–59. Cited by: Appendix A, §1, §5, §5.
  • [68] D. Li, S. Arya, and R. Ghrist (2025) Learning from frustration: torsor cnns on graphs. In Proceedings of the Workshop on Symmetry and Geometry in Neural Representations at NeurIPS 2025, Note: Workshop paper External Links: Link Cited by: §C.1.2, Appendix D.
  • [69] Y. Li, R. Yu, C. Shahabi, and Y. Liu (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In International Conference on Learning Representations (ICLR), External Links: Link Cited by: §F.2.1, §F.2.1, §F.2.3, Table 2, Table 2, §6.
  • [70] E. Lila, J. A. Aston, and L. M. Sangalli (2016) Smooth principal component analysis over two-dimensional manifolds with an application to neuroimaging. Cited by: Appendix A.
  • [71] H. Liu, Z. Dong, R. Jiang, J. Deng, J. Deng, Q. Chen, and X. Song (2023) Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), External Links: Document Cited by: §F.2.3, Table 2, Table 2.
  • [72] L. Malagò, L. Montrucchio, and G. Pistone (2018-12) Wasserstein riemannian geometry of gaussian densities. Information Geometry 1 (2), pp. 137–179. External Links: ISSN 2511-249X, Document, Link Cited by: §6, Example 2.
  • [73] I. Marisca, J. Bamberger, C. Alippi, and M. M. Bronstein (2026) Over-squashing in spatiotemporal graph neural networks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §C.1.3.
  • [74] P. Mostowsky, V. Dutordoir, I. Azangulov, N. Jaquier, M. J. Hutchinson, A. Ravuri, L. Rozo, A. Terenin, and V. Borovitskiy (2024) The GeometricKernels package: heat and matérn kernels for geometric learning on manifolds, meshes, and graphs. arXiv:2407.08086. External Links: Link Cited by: Appendix A, §1.
  • [75] C. Müller and C. Wockel (2009-Sept) Equivalences of smooth and continuous principal bundles with infinite-dimensional structure group. advg 9 (4), pp. 605–626. External Links: ISSN 1615-715X, Link, Document Cited by: item 3, §H.4.
  • [76] L. I. Nicolaescu (2007) Lectures on the geometry of manifolds. 2 edition, World Scientific, Singapore. External Links: ISBN 9789812708533 Cited by: §G.2.
  • [77] M. Papillon, G. Bernardez, C. Battiloro, and N. Miolane (2025) TopoTune: a framework for generalized combinatorial complex neural networks. External Links: Link Cited by: §1.
  • [78] Y. Peng, J. Dong, Y. Zeng, H. Li, C. Ju, H. Feng, D. Taha, A. Wienhard, and K. Xia (2026) Sheaf neural networks on spd manifolds: second-order geometric representation learning. arXiv preprint arXiv:2604.20308. Cited by: Appendix A, §1.
  • [79] P. Petersen (2006) Riemannian geometry. 2 edition, Graduate Texts in Mathematics, Vol. 171, Springer, New York. External Links: ISBN 978-0-387-29403-2, Document Cited by: §G.2.
  • [80] I. F. Pinelis (1991) Inequalities for distributions of sums of independent random vectors and their application to estimating a density. Theory of Probability & Its Applications 35 (3), pp. 605–607. External Links: Document Cited by: §H.1, §H.2.
  • [81] J. Porras-Valenzuela, Z. Wang, and A. Ribeiro (2026) Size transferability of graph transformers with convolutional positional encodings. External Links: 2602.15239, Link Cited by: Appendix B.
  • [82] M. Reed and B. Simon (1972) Functional analysis. Methods of Modern Mathematical Physics, Vol. 1, Academic Press, New York. External Links: ISBN 0125850018 9780125850018, Link Cited by: §G.4, §H.6.
  • [83] H. Riess and R. Ghrist (2022) Diffusion of information on networked lattices by gossip. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 5946–5952. Cited by: Appendix A.
  • [84] H. M. Riess and J. Hansen (2020) Multidimensional persistence module classification via lattice-theoretic convolutions. In NeurIPS Workshop: TDA & Beyond, Cited by: §1.
  • [85] R. M. Rustamov (2007) Laplace-beltrami eigenfunctions for deformation invariant shape representation. In Proceedings of the Symposium on Geometry Processing (SGP), A. Belyaev and M. Garland (Eds.), pp. 225–233. External Links: Document, ISBN 978-3-905673-46-3, ISSN 1727-8384 Cited by: §5.
  • [86] F. Scarselli et al. (2008) The graph neural network model. IEEE Trans. on neural networks 20 (1), pp. 61–80. Cited by: §1.
  • [87] S. C. Schonsheck et al. (2018) Parallel transport convolution: a new tool for convolutional neural networks on manifolds. arXiv preprint arXiv:1805.07857. Cited by: §1.
  • [88] J. Serre (1955) Faisceaux algébriques cohérents. Annals of Mathematics, pp. 197–278. Cited by: Appendix A.
  • [89] L. Shao, Z. Lin, and F. Yao (2022) Intrinsic riemannian functional data analysis for sparse longitudinal observations. The Annals of Statistics 50 (3), pp. 1696–1721. Cited by: Appendix A.
  • [90] A. D. Shepard (1985) A cellular description of the derived category of a stratified space. Brown University. Cited by: Appendix A, §1.
  • [91] D. I. Shuman et al. (2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30 (3), pp. 83–98. Cited by: §1.
  • [92] A. Singer and H.-T. Wu (2012) Vector diffusion maps and the connection laplacian. Communications on Pure and Applied Mathematics 65 (8), pp. 1067–1144. External Links: Document Cited by: Appendix A, Appendix B, §C.1.1, §1.
  • [93] A. Singer and H. Wu (2017) Spectral convergence of the connection laplacian from random samples. Information and Inference: A Journal of the IMA 6 (1), pp. 58–123. External Links: Document, Link Cited by: Appendix A, Appendix B, Appendix B, §1, §5.
  • [94] F. Spoto, A. Caponera, and P. Brutti (2025) Change point detection for functional autoregressive processes on the sphere. arXiv preprint arXiv:2512.03255. Cited by: Appendix A.
  • [95] J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu (2024) Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, pp. 127063. Cited by: Appendix B.
  • [96] S. Thakoor, C. Tallec, M. G. Azar, M. Azabou, E. L. Dyer, R. Munos, P. Veličković, and M. Valko (2022) Large-scale representation learning on graphs via bootstrapping. In International Conference on Learning Representations, External Links: Link Cited by: Appendix B.
  • [97] B. Vallet and B. Lévy (2008) Spectral geometry processing with manifold harmonics. Computer Graphics Forum 27 (2), pp. 251–260. External Links: Document, ISSN 1467-8659 Cited by: §5.
  • [98] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Conference on Neural Information Processing Systems (NeurIPS), pp. 6000–6010. External Links: Link Cited by: Appendix B.
  • [99] C. Villani (2003) Topics in optimal transportation. Graduate Studies in Mathematics, Vol. 58, American Mathematical Society, Providence, RI. External Links: ISBN 978-0821833124 Cited by: Example 5.
  • [100] U. Von Luxburg, M. Belkin, and O. Bousquet (2008) Consistency of spectral clustering. The Annals of Statistics, pp. 555–586. Cited by: Appendix B.
  • [101] Z. Wang, L. Ruiz, and A. Ribeiro (2021) Stability of manifold neural networks to deformations. arXiv preprint arXiv:2106.03725. Cited by: §C.1.1, §5, §5.
  • [102] Z. Wang, L. Ruiz, and A. Ribeiro (2021) Stability of neural networks on riemannian manifolds. In 2021 29th European Signal Processing Conference (EUSIPCO), pp. 1845–1849. Cited by: §1, §1.
  • [103] Z. Wang, L. Ruiz, and A. Ribeiro (2022) Convolutional neural networks on manifolds: from graphs and back. arXiv:2210.00376. Cited by: Appendix A, §1, §1.
  • [104] Z. Wang, L. Ruiz, and A. Ribeiro (2024) Geometric graph filters and neural networks: limit properties and discriminability trade-offs. IEEE Transactions on Signal Processing 72 (), pp. 2244–2259. External Links: Document Cited by: §5, §5.
  • [105] M. Weiler, P. Forré, E. Verlinde, and M. Welling (2026) Equivariant and coordinate independent convolutional networks: a gauge field theory of neural networks. Progress in Data Science, Vol. 1, World Scientific Publishing Company. Note: Monograph on equivariant and gauge-theoretic neural network architectures and their coordinate-independent generalizations External Links: ISBN 9789819806621 Cited by: §C.1.2.
  • [106] H. Wu, K. Weng, S. Zhou, X. Huang, and W. Xiong (2024) Neural manifold operators for learning the evolution of physical dynamics. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3356–3366. Cited by: Appendix A.
  • [107] Y. Xie, J. Tian, and X. X. Zhu (2020) Linking points with labels in 3d: a review of point cloud semantic segmentation. IEEE Geoscience and Remote Sensing Magazine 8 (4), pp. 38–59. Cited by: §1.
  • [108] Z. Yang, S. Huang, H. Feng, and D. Zhou (2024) Spherical analysis of learning nonlinear functionals. arXiv preprint arXiv:2410.01047. Cited by: §5.
  • [109] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen (2020) Graph contrastive learning with augmentations. Advances in neural information processing systems 33, pp. 5812–5823. Cited by: Appendix B.
  • [110] J. Yu, J. Choi, D. Lee, H. Hong, and J. Kim (2024) Self-supervised transformation learning for equivariant representations. Advances in Neural Information Processing Systems 37, pp. 83068–83090. Cited by: Appendix B.
  • [111] O. Zaghen, F. Eijkelboom, A. Pouplin, and E. J. Bekkers (2025) Towards variational flow matching on general geometries. In ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, External Links: Link Cited by: Appendix A.
  • [112] Y. Zhao, F. Ji, X. Jian, and W. P. Tay (2026) Graph distribution-valued signals: a wasserstein space perspective. External Links: 2509.25802, Link Cited by: Example 5.

Appendix Contents

A   Extended Related Works .................................................................................................................. A
B   Broader Impact, Future Directions and Limitations .................................................................................................................. B
C   Existing Convolutional Architectures as Actualizations of HilbNets .................................................................................................................. C
 C.1  Universality of HilbNets .................................................................................................................. C.1
  C.1.1  CNNs, GNNs, and Sheaf NNs .................................................................................................................. C.1.1
  C.1.2  Equivariant CNNs and GNNs .................................................................................................................. C.1.2
  C.1.3  Spatio-Temporal GNNs .................................................................................................................. C.1.3
D   Practical Implementations of Parallel Transport .................................................................................................................. D
E   Parallel Transport Parametrizations .................................................................................................................. E
 E.1  From Bundle Transports to Network-Sheaf Restrictions .................................................................................................................. E.1
 E.2  Transport Hypothesis Classes .................................................................................................................. E.2
 E.3  Parameter Counts .................................................................................................................. E.3
F   Additional Experimental Details .................................................................................................................. F
 F.1  Synthetic Experiments: The Statistical Bundle over Centered Gaussians .................................................................................................................. F.1
  F.1.1  The Bundle .................................................................................................................. F.1.1
  F.1.2  Levi-Civita Connection and Closed-Form Parallel Transport .................................................................................................................. F.1.2
  F.1.3  Cholesky Rescaling .................................................................................................................. F.1.3
  F.1.4  Sample Construction .................................................................................................................. F.1.4
  F.1.5  Spectral Stability under Sampling Density Increase .................................................................................................................. F.1.5
  F.1.6  Hyperparameters .................................................................................................................. F.1.6
 F.2  Traffic Forecasting: Experimental Details .................................................................................................................. F.2
  F.2.1  Datasets .................................................................................................................. F.2.1
  F.2.2  Task Formulation .................................................................................................................. F.2.2
  F.2.3  Model Variants and Baselines .................................................................................................................. F.2.3
  F.2.4  Detailed Results .................................................................................................................. F.2.4
  F.2.5  Hyperparameters .................................................................................................................. F.2.5
G   Mathematical Background .................................................................................................................. G
 G.1  Hilbert Bundles .................................................................................................................. G.1
 G.2  Connection Laplacian ..................................................................................................................
 G.3  Heat Flow on a Hilbert Bundle .................................................................................................................. G.3
 G.4  Borel Functional Calculus .................................................................................................................. G.4
 G.5  Cellular Sheaves and Sheaf Laplacians .................................................................................................................. G.5
 G.6  Empirical Laplacians .................................................................................................................. G.6
H   Proofs of Results .................................................................................................................. H
 H.1  Auxiliary Lemmas for Theorem 1 ..................................................................................................................
 H.2  Key Lemmas for Theorem 1 ..................................................................................................................
 H.3  Proof of Theorem 1 .................................................................................................................. H.3.1
 H.4  Key Lemmas for Theorem 2 ..................................................................................................................
 H.5  Proof of Theorem 2 .................................................................................................................. H.5
 H.6  Key Lemmas for Corollary 1 (Convergence in Architecture) ..................................................................................................................
 H.7  Proof of Corollary 1 (Convergence in Architecture) ..................................................................................................................
 H.8  Proof of Corollary 2 (Transferability) ..................................................................................................................

Appendix A Extended Related Works

The connection between (possibly) continuous domains (manifolds and bundles) and discrete structure (graphs and cellular sheaves) first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional (Riemannian) manifolds [22]. The manifold hypothesis underpins several modern spectral graph methods, i.e., nonlinear dimensionality-reduction/clustering/(deep) learning techniques that exploit latent geometric structures. The renowned work [14] from Belkin & Niyogi proved that, assuming access to a finite point cloud (the signals) sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.

The work in [14] and related results, such as [92, 93], have been used (directly or indirectly) to design principled learning systems over manifolds. A consistent fraction of this literature focused on scalar manifold signals, thus the case in which one or more scalar values are attached to each point of a manifold. Notable examples are manifold convolutional neural networks [103, 67], kernel methods and Gaussian processes on manifolds [20, 74], as well as a growing literature of generative manifold models [111, 33]. In a complementary direction, operator-learning methods on manifolds extend neural operators beyond Euclidean domains; these methods handle an infinite-dimensional object globally, but they still assign a finite vector to each point of a manifold. Most of the works in this class are instances of neural manifold operators [26, 19, 106, 58], which aim at resolution-independent learning of PDE solution operators. Some works explored vector-valued manifold signals, i.e., multivariate real-valued functions supported on manifolds; in this case, one or more finite vectors are attached to each point of a manifold. Examples are tangent bundle convolutional neural networks [11] and vector-field Gaussian processes on manifolds built via gauge-independent projected kernels [56]. Moreover, especially in the statistics community, functional observations with manifold structure, i.e., manifold-valued functions supported on the real line, have been long studied [70, 31, 89], and recent works have started analyzing autoregressive processes on the sphere [24, 94]. Finally, learning systems acting on discrete bundles, i.e., bundles whose base space is a finite set/discrete manifold, have been recently investigated [5]. Despite their diversity, all the models cited in this paragraph use finite-dimensional fibers and implicitly assume the Levi-Civita connection. As such, they do not allow for the arbitrary connections or the potentially infinite-dimensional signals considered. One of the main reasons behind this gap is the lack of a rigorous generalization of the [14]’s convergence result in these settings, which is our main contribution.

Pioneering works on sheaf theory can be found in [65, 88, 48]. Cellular sheaves are combinatorial instances of sheaves that have been introduced in [90] and later rediscovered in [29]. In [90, 29], these sheaves were first defined over regular cell complexes, hence the term “cellular” sheaves. However, as in this work, cellular sheaves are often defined over tamer objects, here graphs. In [51, 35], the authors studied the problem of learning vector cellular sheaves, i.e., cellular sheaves over undirected graphs with finite-dimensional node signals. The works in [53, 52, 42, 83] introduced a novel class of diffusion dynamics on vector cellular sheaves. In [18, 50, 7, 39, 37, 78, 9, 11], neural networks operating on vector cellular sheaves over (undirected, directed, hyper) graphs with finite-dimensional signals are presented, generalizing graph neural networks. We again note however, that all these works implicitly or explicitly restrict to consider either the Levi-Civita or flat connections. Additionally, the work in [18] exploited vector cellular sheaf theory to show that the underlying geometry of the graph gives rise to oversmoothing behavior of GCNs. Also, (vector and general) cellular sheaves recently appeared in causal theory [30], control [49], and telecommunications [47]. Finally, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on vector cellular sheaves from manifold samples.

Appendix B Broader Impact, Future Directions and Limitations

The potential impact of this work extends well beyond the effectiveness of the HilbNet architecture. Our convergence result unifies and extends the graph- and vector-diffusion convergence theories of [14, 93], thereby enabling novel geometric learning systems for genuinely infinite-dimensional manifold-supported and equipped with arbitrary connections. HilbNets are just a first (principled, transferable) instance of such systems, but our result opens several new avenues.

Clustering and Dimensionality Reduction. Classical Laplacian-based methods for clustering [100] and nonlinear dimensionality reduction [13, 28] all rely, either explicitly or implicitly, on the convergence of the graph Laplacian to the Laplace–Beltrami operator. Our Theorem 1 provides the analogous foundation in the Hilbert bundle setting, immediately suggesting sheaf-spectral generalizations of these techniques. For instance, one may define Hilbert sheaf eigenmaps by computing the leading eigensections of Δnt\Delta_{\mathcal{F}_{n}^{t}} and using them as coordinates, yielding embeddings that are aware not only of the base manifold geometry but also of the fiber-wise coupling encoded by the connection. This is particularly promising for data such as spatiotemporal fields or distributional signals, where standard spectral methods discard the internal structure of each observation. Similarly, sheaf-spectral clustering would partition data by jointly considering geometric proximity on \mathcal{M} and coherence of the infinite-dimensional signals across fibers, a strictly richer criterion than what scalar graph Laplacians can capture. The finite-rank convergence guarantee of Theorem 2 ensures that such methods can be implemented with truncated signals while remaining provably consistent with the underlying continuous geometry.

Structured Self-Supervised Learning. Self-supervised learning (SSL) has largely been built around objectives that encourage invariance or equivariance with respect to augmentations of the base domain [109, 96, 32, 110]. Our framework suggests a more structured family of SSL methods. Because the Hilbert sheaf Laplacian encodes both spatial geometry and fiber-wise transport, one can design contrastive or non-contrastive objectives that encourage learned representations to be sections of an appropriate bundle, i.e., to satisfy local consistency constraints dictated by the restriction maps. Such objectives would yield representations that are not merely invariant to domain augmentations but are geometrically coherent across fibers, a property that is especially desirable when downstream tasks depend on the relational structure between signals at different manifold points, as in multi-sensor forecasting or multi-agent coordination.

Generalizability Theory and Mechanistic Interpretability for Transformers. Another promising direction concerns the connection between our framework and transformer architectures [98]. In a standard transformer, each token is equipped with a positional encoding, either fixed (e.g., sinusoidal) or learned, that situates it in a continuous geometric space [95]. These positional encodings can be viewed as sampled points on a base manifold \mathcal{M}, with the encoding scheme implicitly defining the metric structure of the domain. The residual-stream representation at each position, or, in the infinite-width or infinite-context limit, the full distribution over possible activations, then lives in a Hilbert space fibered over this base point, so that the collection of representations across positions constitutes a section of a Hilbert bundle over \mathcal{M}. The attention mechanism then defines a data-dependent transport between these fibers. In this view, a self-attention layer is an instance of a single-step diffusion under a learned sheaf Laplacian whose base graph is determined by the sampled positional encodings. This may in some sense be viewed as a more precise incarnation of the recently introduced geodesic hypothesis [55], where the autoregressive output of transformers is modeled by a stochastic diffusion PDE in Euclidean space, rather than the proposed manifold-theoretic treatment. Making this correspondence precise would allow one to import the convergence and transferability machinery of Theorems 12 into the transformer setting. We note that there exists a recent line of work that attempts to adapt Laplacian-based GNN generalization and stability results to transformers [81], but due to their fundamental reliance on the convegrence result of [14], must work in a somewhat simplified setting. In conjunction with our convergence result, the generality of the Hilbert sheaf Laplacian is potentially well-suited for establishing extensions of these generalization theorems for a broader class of transformers. On the interpretability side, decomposing attention into a positional-affinity component and a fiber-transport component offers a principled lens through which to study what information each head moves and how it is transformed in transit. One could, for example, measure the holonomy of the learned connection around closed loops of attention to detect whether a head implements a nontrivial geometric transformation. While formalizing these connections requires treatment of the data-dependence of the connection and the interplay between positional and content-based attention, the mathematical infrastructure developed in this work provides a strong starting point.

Limitations. Our theoretical guarantees rest on some assumptions that may not hold exactly in practice, as is usually the case. The convergence results (Theorems 12) require the base manifold \mathcal{M} to be closed (compact without boundary), sections to be C3C^{3} or C4C^{4} smooth, and samples to be drawn i.i.d. from the uniform distribution. Real-world sensor networks, such as those in our traffic experiments, are neither uniformly sampled nor necessarily supported on compact manifolds, and measured signals are typically noisy rather than smooth. These gaps between theory and practice are absolutely standard in the Laplacian convergence literature: the foundational results of [14], as well as subsequent works on vector diffusion maps [92, 93] and manifold-based learning [28, 13], all assume compact manifolds and uniform or smooth sampling densities, yet are routinely and successfully applied under weaker conditions. Our numerical results confirm that HilbNets likewise remain effective under these standard approximations, consistently outperforming baselines that lack the principled bundle-geometric structure. On the computational side, the network sheaf Laplacian Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} scales quadratically in the product of spatial and fiber dimensions, which may become prohibitive for very large graphs or high-dimensional signal discretizations without further sparsification or approximation strategies. Finally, broader and tailored empirical validation on other infinite-dimensional signal types, such as distributional or functional data on manifolds, remains an important direction for future work.

Appendix C Existing Convolutional Architectures as Actualizations of HilbNets

C.1 Universality of HilbNets

Due to the first-principles approach to the construction of HilbNets, we note that they serve as a sort of universal architecture. That is, several popular variations of convolutional architectures in geometric deep learning, across domains and modalities, can be derived as particular instantiations of HilbNets — even when their construction does not explicitly invoke cellular sheaves. We consider a few concrete examples of this philosophy.

C.1.1 CNNs, GNNs, and Sheaf NNs

Convolutional neural networks (CNNs) and graph neural networks (GNNs) are both often formalized as acting on signals f:f:\mathcal{M}\to\mathbb{R}. In particular, by the consistency of the discrete Fourier transform, CNNs operating on fixed grid can be viewed as operating on a principled discretization of =[0,1]2\mathcal{M}=[0,1]^{2}. We may more generally view the uniform grid on which CNNs act as a particular instantiation of a graph, and thus view CNNs as a special case of GNNs [59], and the relevant convolutional operator as the graph Laplacian. The GNNs can likewise be viewed as principled discretizations of manifold neural networks [101], precisely by the convergence result of [14]. Sheaf neural networks [50, 18] may then be understood as an enrichment that allows for matrix-valued edge weights rather than scalars during convolution. Further, in [9], it was made precise that sheaf neural networks may in particular, be viewed as acting upon tangent bundle signals s:Ts:\mathcal{M}\to T\mathcal{M}, via the convergence result of [92]. In particular, the convergence result of [92] applies strictly to tangent bundle setting, providing an explanation as to why existing works of sheaf neural networks either explicitly or implicitly restrict to discretizing the tangent bundle with either flat or Levi-Civita connections [7, 5]. As such, existing sheaf neural network architectures can be typically be recovered as HilbNets under the paradigm that =T\mathcal{E}=T\mathcal{M}.

C.1.2 Equivariant CNNs and GNNs

The equivariant case of CNNs and GNNs arises when we wish for our architecture to respect some underlying symmetry group GG\curvearrowright\mathcal{M}. More generally, there may not exist a global representation of the symmetry, but rather only a local representation. In physics, this is known as a gauge symmetry, and is formalized by considering our signal ff as a section of a bundle with connection (,,)(\mathcal{M},\mathcal{E},\nabla), where the group action is then encoded as a symmetry of the connection \nabla. As such, gauge-equivariant CNNs and GNNs constitute perhaps the most general equivariant architectures in the literature (see Weiler et al. [105] for a thorough introduction in the CNN case), and are formulated precisely as convolutions at the level of sections of a frame bundle (although the necessary datum of a connection is often suppressed in the literature). From this perspective, it is natural that gauge-equivariant CNNs and GNNs can be derived as particular cases of cellular sheaf networks (see the Li et al. [68] for a more in-depth exploration of this perspective). Thus, by noting that these architectures can be equivalently reformulated in the language of sheaves, our Theorem 1, as well as the consequent transferability result, can be seen to apply to these architectures. In particular, this may be understood as establishing the theoretical bedrock for the intuitive idea that as the underlying mesh or graph is increasingly refined, these architectures indeed approach continuous operators on sections of the underlying bundle, while maintaining equivariance across scales.

C.1.3 Spatio-Temporal GNNs

A formal treatment of signal processing of graphs whose signals at each node are timeseries is still emerging and is an active area of research, and these filtering techniques then serve as the basis for the development of spatiotemporal graph neural networks (STGNNs). In this literature, it is common to consider convolutional operators built via the joint Laplacian LJ=LTIdG+IdTLGL_{J}=L_{T}\otimes\operatorname{Id}_{G}+\operatorname{Id}_{T}\otimes L_{G}, where LGL_{G} is the graph-domain Laplacian and LTL_{T} is the ‘time-domain’ Laplacian [45]. Due to this decomposition, the resulting ‘time’ and ‘space’ filters commute, allowing for the development of both time-and-space or time-then-space STGNNs [73]. Consider now the continuous setting. Convolutions via Δ\Delta_{\nabla} intertwine the spatial and temporal domains, and this is precisely encoded by our parallel transport maps. So suppose our bundle is trivial =×L2(n)\mathcal{E}=\mathcal{M}\times L^{2}(\mathbb{R}^{n}) with trivial connection \nabla. In this case, our parallel transport maps are simply Pγ=IdP_{\gamma}=Id, and the connection Laplacian collapses to the Laplace-Beltrami operator. On product manifolds =1×2\mathcal{M}=\mathcal{M}_{1}\times\mathcal{M}_{2}, the Laplace-Beltrami operator decomposes as Δ=Δ1Id2+Id1Δ2\Delta_{\mathcal{M}}=\Delta_{\mathcal{M}_{1}}\otimes\operatorname{Id}_{\mathcal{M}_{2}}+\operatorname{Id}_{\mathcal{M}_{1}}\otimes\Delta_{\mathcal{M}_{2}}, implying that heat flow is given by etΔM=etΔM1etΔM2e^{t\Delta_{M}}=e^{t\Delta_{M_{1}}}\otimes e^{t\Delta_{M_{2}}} (see [46] for a formal derivation). Expressing etΔe^{t{\Delta_{\mathcal{M}}}} as an integral operator via the heat kernel, the fact that spatial and temporal filters commute in this case is then simply an application of Fubini’s theorem. As such, we see that the type of filtering commonly considered in STGNNs is recovered precisely as the ‘base case’ of HilbNet, and in particular, our robustness guarantees can also be applied to these STGNN architectures.

Appendix D Practical Implementations of Parallel Transport

As we have established, a key strength of the HilbNets architecture is the ability to encode signal-level geometric priors through the principled incorporation of relevant parallel transport operators. This naturally raises the question as to how these transport operators should be implemented in-practice. We may consider three general classes of use-cases.

Task-Inherent Priors The most theoretically well-grounded case is when knowledge of the geometry of task itself may be utilized to build our parallel transport operators. For instance, suppose the nodes of our base graph represent cameras and the task is multi-view 3D recognition. Then the relevant transport operators should record the rotation PxixjSO(3)P_{x_{i}\to x_{j}}\in SO(3) that aligns views as in [68], resulting in an appropriately equivariant sheaf Laplacian operator. More generally, whenever the data modality lacks a ‘global’ reference frame or coordinate system, then the appropriate alignments between local reference frames precisely gives rise to a connection and the associated parallel transport. For instance, biomedical timeseries analysis often utilizes algorithms based upon the large deformation diffeomorphic metric mapping (LDDMM) [12], a core part of which may be understood as extracting the necessary parallel transport from the data using the first-order ODE definition. We may also consider the tangent bundle networks of [11] in this category, as they use explicit vector-field data from which they may then compute the necessary sheaf transition maps via local PCA. As such, we see that HilbNets may be applied to any of these settings, where the relevant parallel transport would be completely determined by the task itself and thus, can typically be explicitly pre-computed.

Domain-Inherent Priors Alternatively, it is often the case that we may not have access to task-specific priors, but rather to general knowledge of the structure of the signal-domain. For instance, in many domains, our generic stalks may be equipped with the additional structure of a reproducing kernel Hilbert space (RKHS), i.e. κ\mathcal{H}_{\kappa}. Analogously to the previous case, we may then view parallel transport as operators that that maximize alignment, but now with respect to our kernel. For instance, given a choice of similarity kernel κ\kappa between timeseries or distributions, then given our initial section data {Su}u=1F0\{S_{u}\}_{u=1}^{F_{0}}, we may define our parallel transport operator matrices via

Pxjxi(d,eij):=argmax𝐓𝒞q=1F0κ(𝐓𝐒i,q,𝐒j,q)P_{x_{j}\to x_{i}}^{(d,e_{ij})}:=\arg\max_{\mathbf{T}\in\mathcal{C}}\sum^{F_{0}}_{q=1}\kappa(\mathbf{T}\mathbf{S}_{i,q},\mathbf{S}_{j,q}) (15)

for some suitable class of operators 𝒞O(d)\mathcal{C}\subseteq O(d), and force the diagonal blocks of the sheaf Laplacian to be the sum of the scalar edge weights given by the kernel. This is exactly the discretized Sheaf Laplacian from (8). In practice, given a choice of similarity kernel on our fibers, we may then either precompute these parallel transport operators using the above optimization objective or learn them end-to-end with the model’s learned filters. In the latter case, (15) is applied as a regularization to the task loss. The special case in which (15) is not employed at all, 𝒞=O(d)\mathcal{C}=O(d), K=1K=1 in (10), recovers the sheaf diffusion neural network from [18]. As such, we see that the greater generality of the Hilbert sheaf Laplacian consequently lends itself to more flexible and perhaps more broadly applicable design choices than existing sheaf neural networks.

Appendix E Parallel Transport Parametrizations

This appendix details the finite-dimensional transport parametrizations used to instantiate the network sheaf Laplacian Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} in the experiments. In particular, this particular instantiation of HilbNets may be considered as a paticular of the end-to-end learning paradigm introduced in D for polynomial filters and a few demonstrative classes of 𝒞\mathcal{C} and κ\kappa. The discussion should be read as a continuation of the two-stage discretization in Section 5: after sampling the manifold, we obtain a Hilbert cellular sheaf nt\mathcal{F}_{n}^{t}; after sampling or projecting the fibers, we obtain a finite-dimensional network sheaf n,dt\mathcal{F}_{n,d}^{t} with dd-dimensional stalks.

E.1 From bundle transports to network-sheaf restrictions

Recall that, before signal discretization, the Hilbert cellular sheaf nt\mathcal{F}_{n}^{t} induced by a sample 𝒳n={x1,,xn}\mathcal{X}_{n}=\{x_{1},\dots,x_{n}\} assigns the node stalk

nt(xi)=xi\mathcal{F}_{n}^{t}(x_{i})=\mathcal{E}_{x_{i}} (16)

and, for an edge eije_{ij}\in, the edge stalk

nt(eij)=mγij,\mathcal{F}_{n}^{t}(e_{ij})=\mathcal{E}_{m_{\gamma_{ij}}}, (17)

where mγijm_{\gamma_{ij}} is the midpoint of the chosen geodesic between xix_{i} and xjx_{j}. Its restriction maps are weighted parallel transports of the form

(nt)xieij=kijtPximγij,kijt=exp(d(xi,xj)24t).(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}},\qquad k_{ij}^{t}=\exp\left(-\frac{d_{\mathcal{M}}(x_{i},x_{j})^{2}}{4t}\right). (18)

After fiber discretization, the network sheaf n,dt\mathcal{F}_{n,d}^{t} has finite-dimensional stalks, which we identify with d\mathbb{R}^{d} after choosing the first dd basis elements of the fiber Hilbert space. The corresponding restriction maps are matrices

(n,dt)xieij:dd.(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}:\mathbb{R}^{d}\to\mathbb{R}^{d}. (19)

For the pragmatic parametrizations used in the experiments, it is useful to express the same sheaf Laplacian in node-to-node transport coordinates. Fix an orientation convention for each edge eije_{ij}. After identifying the edge stalk with the coordinate system of one endpoint, we write the restrictions as

(n,dt)xieij=kijtId,(n,dt)xjeij=kijtPxjxi(d,eij),(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,I_{d},\qquad(\mathcal{F}_{n,d}^{t})_{x_{j}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to x_{i}}^{(d,e_{ij})}, (20)

where

Pxjxi(d,eij)O(d)P_{x_{j}\to x_{i}}^{(d,e_{ij})}\in O(d) (21)

is the finite-dimensional transport carrying the discretized fiber over xjx_{j} into the discretized fiber over xix_{i} along edge eije_{ij}. When the transport comes from the continuous Hilbert bundle, this matrix is the finite-dimensional representation of the corresponding parallel transport, after the chosen fiber projection and coordinate identification. When the continuous connection is unknown, Pxjxi(d,eij)P_{x_{j}\to x_{i}}^{(d,e_{ij})} is instead chosen from a transport hypothesis class.

With the shorthand

Pjie:=Pxjxi(d,eij),P_{j\to i}^{e}:=P_{x_{j}\to x_{i}}^{(d,e_{ij})}, (22)

the action of the network sheaf Laplacian on a sampled signal

𝐬n,d=(𝐬x1,,𝐬xn)C0(n,dt;Gn),𝐬xid,\mathbf{s}_{n,d}=(\mathbf{s}_{x_{1}},\dots,\mathbf{s}_{x_{n}})\in C^{0}(\mathcal{F}_{n,d}^{t};G_{n}),\qquad\mathbf{s}_{x_{i}}\in\mathbb{R}^{d}, (23)

takes the concrete form

(Δn,dt𝐬n,d)xi=xj𝒩(xi)kijt(𝐬xiPjieij𝐬xj).(\Delta_{\mathcal{F}_{n,d}^{t}}\mathbf{s}_{n,d})_{x_{i}}=\sum_{x_{j}\in\mathcal{N}(x_{i})}k_{ij}^{t}\left(\mathbf{s}_{x_{i}}-P_{j\to i}^{e_{ij}}\mathbf{s}_{x_{j}}\right). (24)

Equivalently, Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} is the block matrix with blocks

(Δn,dt)ij={r:eirEkirtId,i=j,kijtPjieij,ij and eijE,0,otherwise.(\Delta_{\mathcal{F}_{n,d}^{t}})_{ij}=\begin{cases}\displaystyle\sum_{r:\,e_{ir}\in E}k_{ir}^{t}I_{d},&i=j,\\[11.99998pt] \displaystyle-k_{ij}^{t}P_{j\to i}^{e_{ij}},&i\neq j\text{ and }e_{ij}\in E,\\[5.0pt] 0,&\text{otherwise.}\end{cases} (25)

This is the same sheaf Laplacian defined in Section 4, specialized to the coordinate convention in (20). The scalar weight kijtk_{ij}^{t} controls how strongly the two sampled base points interact, while the transport matrix PjieijP_{j\to i}^{e_{ij}} controls how the two discretized fibers are aligned before their signals are compared.

E.2 Transport hypothesis classes

As mentioned in D, the true connection is often unknown and parallel transport maps must be learned. In the experimental results of this work, we always learn the parallel transport maps end-to-end using the task loss regularized with (15), and we restrict each edgewise transport to hypothesis classes 𝒞\mathcal{C} such that

Pjieij𝒞O(d).P_{j\to i}^{e_{ij}}\in\mathcal{C}\subseteq O(d). (26)

We use three transport classes in the experiments: frozen identity, free orthogonal transports, and circulant or time-stationary transports.

Frozen identity.

The simplest class is

𝒞id={Id}.\mathcal{C}_{\mathrm{id}}=\{I_{d}\}. (27)

This recovers the usual assumption that neighboring fibers are canonically identified and that no non-trivial alignment is needed. In this case,

Pjieij=IdP_{j\to i}^{e_{ij}}=I_{d} (28)

for every edge, and (24) becomes

(Δn,dt𝐬n,d)xi=xj𝒩(xi)kijt(𝐬xi𝐬xj).(\Delta_{\mathcal{F}_{n,d}^{t}}\mathbf{s}_{n,d})_{x_{i}}=\sum_{x_{j}\in\mathcal{N}(x_{i})}k_{ij}^{t}\left(\mathbf{s}_{x_{i}}-\mathbf{s}_{x_{j}}\right). (29)

Thus, frozen identity reduces the sheaf Laplacian to a standard weighted graph Laplacian applied independently to each fiber coordinate, and, therefore, HilbNets to standard GCNs. It is a useful baseline: any improvement over frozen identity quantifies the value of learning or imposing non-trivial transports.

Free orthogonal transports.

The most expressive finite-dimensional class is

𝒞free=O(d),\mathcal{C}_{\mathrm{free}}=O(d), (30)

or, when the target transports are known to lie in the identity component,

𝒞free=SO(d).\mathcal{C}_{\mathrm{free}}=SO(d). (31)

In the experiments, we parameterize free orthogonal transports by products of Householder reflections.

For a nonzero vector vdv\in\mathbb{R}^{d}, define the Householder reflection

H(v)=Id2vvv22.H(v)=I_{d}-2\frac{vv^{\top}}{\|v\|_{2}^{2}}. (32)

Each H(v)H(v) is orthogonal and symmetric:

H(v)H(v)=Id,H(v)=H(v).H(v)^{\top}H(v)=I_{d},\qquad H(v)^{\top}=H(v). (33)

For each oriented edge eije_{ij}, we store RR Householder vectors

veij,1,,veij,Rdv_{e_{ij},1},\dots,v_{e_{ij},R}\in\mathbb{R}^{d} (34)

and define

Pjieij=H(veij,R)H(veij,1).P_{j\to i}^{e_{ij}}=H(v_{e_{ij},R})\cdots H(v_{e_{ij},1}). (35)

Therefore PjieijP_{j\to i}^{e_{ij}} is exactly orthogonal for every parameter value.

By the Cartan–Dieudonné theorem, every matrix in O(d)O(d) can be represented as a product of at most dd Householder reflections. In practice, the choice of RR is dataset-dependent: in our synthetic experiments we use R=16R=16 for d=m=10d=m=10, a modest over-parametrization that aids optimization in our traffic experiments we use R=8R=8 for d=T=12d=T=12, which parameterizes a strict Householder subset of O(T)O(T) rather than the full orthogonal group, and which we find sufficient for the alignment patterns observed in the data. If one fixes exactly RR non-degenerate reflections, the determinant parity is fixed:

det(Pjieij)=(1)R.\det(P_{j\to i}^{e_{ij}})=(-1)^{R}. (36)

Thus, an even number of reflections parameterizes the identity component SO(d)SO(d), while an odd number parameterizes the other component. In the synthetic Gaussian experiment, the ground-truth Levi-Civita transports are obtained continuously from the identity along geodesics and, after Cholesky rescaling, lie in the identity component. Hence an even number of reflections is appropriate. If both connected components of O(d)O(d) are needed, one may add a fixed final reflection or a discrete sign component. For numerical stability, the implementation uses

H(veij,r)=Id2veij,rveij,rveij,r22+ϵ,H(v_{e_{ij},r})=I_{d}-2\frac{v_{e_{ij},r}v_{e_{ij},r}^{\top}}{\|v_{e_{ij},r}\|_{2}^{2}+\epsilon}, (37)

with a small ϵ>0\epsilon>0 to avoid division by zero at degenerate veij,rv_{e_{ij},r}. This recovers the exact Householder reflection H(q)=Id2qqH(q)=I_{d}-2qq^{\top} with q=veij,r/veij,r2q=v_{e_{ij},r}/\|v_{e_{ij},r}\|_{2} in the limit ϵ0\epsilon\to 0 for veij,r2>0\|v_{e_{ij},r}\|_{2}>0, and yields a matrix orthogonal up to numerical precision.

The free class is useful as an expressivity test. If the target transport belongs to O(d)O(d) in the chosen coordinates, then the Householder class can represent it. This is precisely the role it plays in the synthetic statistical-bundle experiment, where Cholesky rescaling converts the intrinsic Wasserstein-unitary Levi-Civita transports into Euclidean-orthogonal matrices. In real-data experiments, the free class serves as a high-capacity transport baseline.

Circulant or time-stationary transports.

For time-series fibers, the discretized fiber dimension is the number of retained time samples, so we write d=Td=T. A natural prior is that inter-fiber transport should commute with time shifts. Let

𝖲T:TT\mathsf{S}_{T}:\mathbb{R}^{T}\to\mathbb{R}^{T} (38)

be the cyclic shift operator. A time-stationary transport is one satisfying

Pjieij𝖲T=𝖲TPjieij.P_{j\to i}^{e_{ij}}\mathsf{S}_{T}=\mathsf{S}_{T}P_{j\to i}^{e_{ij}}. (39)

The commutant of the cyclic shift is the algebra of circulant matrices. Requiring in addition that PjieijP_{j\to i}^{e_{ij}} be orthogonal gives the class of orthogonal circulant transports.

Let FTF_{T} denote the unitary discrete Fourier transform matrix. Then every orthogonal circulant transport has the form

Pjieij=FTdiag(λeij)FT,|λeij,k|=1.P_{j\to i}^{e_{ij}}=F_{T}^{*}\operatorname{diag}(\lambda_{e_{ij}})F_{T},\qquad|\lambda_{e_{ij},k}|=1. (40)

For real-valued time-domain signals, the Fourier multipliers must satisfy conjugate symmetry:

λeij,Tk=λeij,k¯.\lambda_{e_{ij},T-k}=\overline{\lambda_{e_{ij},k}}. (41)

We therefore store only the independent positive-frequency phases. Let

mT=T12.m_{T}=\left\lfloor\frac{T-1}{2}\right\rfloor. (42)

The learnable parameter for edge eije_{ij} is

φeij=(φeij,1,,φeij,mT)mT.\varphi_{e_{ij}}=(\varphi_{e_{ij},1},\dots,\varphi_{e_{ij},m_{T}})\in\mathbb{R}^{m_{T}}. (43)

We define

λeij,0=1,λeij,k=eiφeij,k,λeij,Tk=eiφeij,k,k=1,,mT.\lambda_{e_{ij},0}=1,\qquad\lambda_{e_{ij},k}=e^{i\varphi_{e_{ij},k}},\qquad\lambda_{e_{ij},T-k}=e^{-i\varphi_{e_{ij},k}},\quad k=1,\dots,m_{T}. (44)

If TT is even, the Nyquist frequency is self-conjugate and is fixed to

λeij,T/2=1\lambda_{e_{ij},T/2}=1 (45)

for the identity-component parametrization. This yields a real orthogonal circulant matrix through (40). Equivalently, PjieijP_{j\to i}^{e_{ij}} can be constructed in real arithmetic from its first column. For r=0,,T1r=0,\dots,T-1, define

ceij[r]=1T[1+𝕀{T even}(1)r+2k=1mTcos(φeij,k+2πkrT)].c_{e_{ij}}[r]=\frac{1}{T}\left[1+\mathbb{I}_{\{T\text{ even}\}}(-1)^{r}+2\sum_{k=1}^{m_{T}}\cos\left(\varphi_{e_{ij},k}+\frac{2\pi kr}{T}\right)\right]. (46)

The full circulant matrix is then

(Pjieij)ab=ceij[(ab)modT],a,b=0,,T1.(P_{j\to i}^{e_{ij}})_{ab}=c_{e_{ij}}[(a-b)\bmod T],\qquad a,b=0,\dots,T-1. (47)

This form is convenient for implementation because it avoids explicitly manipulating complex-valued matrices. The circulant class has only

mT=T12m_{T}=\left\lfloor\frac{T-1}{2}\right\rfloor (48)

parameters per edge, compared with d(d1)/2d(d-1)/2 degrees of freedom for a general orthogonal matrix. Each phase φeij,k\varphi_{e_{ij},k} has a direct interpretation as the phase lag at frequency kk between the two endpoint fibers. Thus, the transport may advance or delay oscillatory components across an edge, but it cannot arbitrarily mix frequencies or reshape the waveform. This is the intended inductive bias for spatiotemporal signals such as traffic or sensor time series, where neighboring sensors may observe delayed or phase-shifted versions of related temporal patterns. In the synthetic experiment, the same class is used more abstractly as a structured subgroup of O(d)O(d) against which the ground-truth transports can be projected.

E.3 Parameter counts

For a graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E) with |E||E| undirected edges and discretized fiber dimension dd, the transport parameter counts are summarized in Table 3.

Transport class Parameters per edge Interpretation
Frozen identity 0 no learned alignment
Free Householder RdRd product of RR reflections in O(d)O(d)
Full circulant (d1)/2\lfloor(d-1)/2\rfloor one phase per positive frequency
Table 3: Parameter counts for the finite-dimensional transport classes used in n,dt\mathcal{F}_{n,d}^{t}.

Thus, the free class is maximally expressive but parameter-heavy, while the circulant classes encode a strong time-stationary prior and scale linearly with the number of frequencies or bands.

Appendix F Additional Experimental Details

F.1 Synthetic experiments: the statistical bundle over centered Gaussians

F.1.1 The bundle

The base manifold is =Sym++(p)\mathcal{M}=\mathrm{Sym}^{++}(p), the open cone of p×pp\times p symmetric positive-definite matrices. Each Σ\Sigma\in\mathcal{M} parameterizes a centered Gaussian 𝒩(0,Σ)\mathcal{N}(0,\Sigma) on p\mathbb{R}^{p} with density ρΣ\rho_{\Sigma}. We equip \mathcal{M} with the Otto-Wasserstein metric, namely the Riemannian metric induced on Sym++(p)\mathrm{Sym}^{++}(p) by the optimal-transport distance W2W_{2} between centered Gaussian measures.

Concretely, the tangent space is naturally identified with symmetric matrices,

TΣSym(p),T_{\Sigma}\mathcal{M}\cong\mathrm{Sym}(p), (49)

and the Otto-Wasserstein inner product between U,VSym(p)U,V\in\mathrm{Sym}(p) is

WΣ(U,V)=12Tr(LΣ[U]V),LΣ[U]Σ+ΣLΣ[U]=U,W_{\Sigma}(U,V)=\tfrac{1}{2}\,\mathrm{Tr}\bigl(L_{\Sigma}[U]\,V\bigr),\qquad L_{\Sigma}[U]\Sigma+\Sigma L_{\Sigma}[U]=U, (50)

where LΣ[U]L_{\Sigma}[U] is the unique symmetric solution of the Lyapunov equation.

Above each Σ\Sigma, the ambient Hilbert fiber is the vector-field space

Σ:=L2(ρΣ;p),\mathcal{H}_{\Sigma}:=L^{2}(\rho_{\Sigma};\mathbb{R}^{p}), (51)

equipped with the inner product

a,bΣ=pa(x)b(x)ρΣ(x)𝑑x.\langle a,b\rangle_{\mathcal{H}_{\Sigma}}=\int_{\mathbb{R}^{p}}a(x)^{\top}b(x)\rho_{\Sigma}(x)\,dx. (52)

This fiber is genuinely infinite-dimensional. The finite-rank fiber used in the synthetic experiments is the Otto-velocity image of covariance perturbations:

Σ:={vV(x)=LΣ[V]x:VSym(p)}L2(ρΣ;p).\mathcal{E}_{\Sigma}:=\bigl\{v_{V}(x)=L_{\Sigma}[V]x:V\in\mathrm{Sym}(p)\bigr\}\subset L^{2}(\rho_{\Sigma};\mathbb{R}^{p}). (53)

Thus, the computational fiber Σ\mathcal{E}_{\Sigma} is a finite-dimensional statistical subspace of the ambient Hilbert fiber. The map VvVV\mapsto v_{V} is an isometry between Sym(p)\mathrm{Sym}(p) with the Otto-Wasserstein metric and Σ\mathcal{E}_{\Sigma} with the L2(ρΣ;p)L^{2}(\rho_{\Sigma};\mathbb{R}^{p}) inner product. Indeed, for U,VSym(p)U,V\in\mathrm{Sym}(p),

vU,vVΣ=𝔼x𝒩(0,Σ)[xLΣ[U]LΣ[V]x]=Tr(LΣ[U]LΣ[V]Σ).\langle v_{U},v_{V}\rangle_{\mathcal{H}_{\Sigma}}=\mathbb{E}_{x\sim\mathcal{N}(0,\Sigma)}\bigl[x^{\top}L_{\Sigma}[U]L_{\Sigma}[V]x\bigr]=\mathrm{Tr}\bigl(L_{\Sigma}[U]L_{\Sigma}[V]\Sigma\bigr). (54)

Using V=LΣ[V]Σ+ΣLΣ[V]V=L_{\Sigma}[V]\Sigma+\Sigma L_{\Sigma}[V] and the fact that LΣ[U]L_{\Sigma}[U], LΣ[V]L_{\Sigma}[V], and Σ\Sigma are symmetric, this equals

12Tr(LΣ[U]V)=WΣ(U,V).\tfrac{1}{2}\mathrm{Tr}\bigl(L_{\Sigma}[U]V\bigr)=W_{\Sigma}(U,V). (55)

Therefore,

dimΣ=dimSym(p)=d=p(p+1)/2.\dim\mathcal{E}_{\Sigma}=\dim\mathrm{Sym}(p)=d=p(p+1)/2. (56)

Since the fibers Σ\mathcal{E}_{\Sigma} are already mm-dimensional, the fiber discretization of Proposition 1 is exact with d=md=m. Throughout this appendix, we therefore write d=md=m and use the network sheaf notation n,dt\mathcal{F}_{n,d}^{t} and Laplacian Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} from Section 5.

This construction is useful because it gives a faithful but tractable proxy for the Hilbert-bundle settings that motivate HilbNets. It is faithful in the sense that the ambient fibers L2(ρΣ;p)L^{2}(\rho_{\Sigma};\mathbb{R}^{p}) are infinite-dimensional vector-field Hilbert spaces, and the Levi-Civita connection of (Sym++(p),WΣ)(\mathrm{Sym}^{++}(p),W_{\Sigma}) yields non-trivial, metric-compatible parallel transports. It is tractable because the Otto-velocity map selects a finite-rank statistical sub-bundle on which the metric, parallel-transport ODE, and projection of ground-truth transports onto restricted transport classes admit closed-form numerical evaluation. Thus, the experiments probe the finite-rank computational slice used by the implementation.

F.1.2 Levi-Civita connection and closed-form parallel transport

The Levi-Civita connection on (Sym++(p),WΣ)(\mathrm{Sym}^{++}(p),W_{\Sigma}) is the canonical metric-compatible torsion-free connection associated with the Otto-Wasserstein metric. In covariance coordinates, its Christoffel symbol is

ΓΣ(U,V)=12(ULΣ[V]+VLΣ[U])Sym(p),\Gamma_{\Sigma}(U,V)=\tfrac{1}{2}\bigl(UL_{\Sigma}[V]+VL_{\Sigma}[U]\bigr)\in\mathrm{Sym}(p), (57)

which is symmetric in (U,V)(U,V), as required for a torsion-free connection.

Let Σt\Sigma_{t} denote the Wasserstein geodesic from Σ0\Sigma_{0} to Σ1\Sigma_{1}. Parallel transport of a tangent vector V(t)Sym(p)V(t)\in\mathrm{Sym}(p) along Σt\Sigma_{t} is governed by

V˙(t)=ΓΣt(Σ˙t,V(t)),V(0)=V0.\dot{V}(t)=-\Gamma_{\Sigma_{t}}\bigl(\dot{\Sigma}_{t},V(t)\bigr),\qquad V(0)=V_{0}. (58)

We solve this ODE numerically by Euler integration with 5050 steps. The resulting linear map on the finite-rank fiber is the ground-truth Levi-Civita transport

PxixjLC:Sym(p)Sym(p).P^{LC}_{x_{i}\to x_{j}}:\mathrm{Sym}(p)\to\mathrm{Sym}(p). (59)

In the notation of Def, 3, the restriction maps of the induced sheaf n,dt\mathcal{F}_{n,d}^{t} use the midpoint transports PximijLCP^{LC}_{x_{i}\to m_{ij}}, which play the role of the discretized parallel transport Pximij(d)P_{x_{i}\to m_{ij}}^{(d)} with d=md=m. These midpoint transports define the restriction maps used in the spectral-stability experiments. The transport-recovery experiments instead regress against the Cholesky-rescaled node-to-node transport P~xjxiLC\tilde{P}^{LC}_{x_{j}\to x_{i}}, which adopts the orientation convention of Appendix D.

Because the connection is metric-compatible, PxixjLCP^{LC}_{x_{i}\to x_{j}} is unitary with respect to the Otto-Wasserstein inner products on the source and target fibers:

Wxj(PxixjLCU,PxixjLCV)=Wxi(U,V).W_{x_{j}}\bigl(P^{LC}_{x_{i}\to x_{j}}U,P^{LC}_{x_{i}\to x_{j}}V\bigr)=W_{x_{i}}(U,V). (60)

We verify this numerically to within 0.5%0.5\% using 200200 Euler steps. This Wasserstein-unitarity is the geometric invariant that justifies comparing the ground-truth transports to orthogonal parametrizations after metric rescaling.

F.1.3 Cholesky rescaling

The free-O(m)O(m) transport class used in the implementation is Euclidean-orthogonal in vectorized fiber coordinates (cf. the hypothesis classes in Appendix D with d=md=m). However, the intrinsic fiber metric is WΣW_{\Sigma}, not the raw Frobenius metric on Sym(p)\mathrm{Sym}(p). Therefore, PxixjLCP^{LC}_{x_{i}\to x_{j}} is not generally orthogonal in raw coordinates.

Let GΣG_{\Sigma} be the Gram matrix of WΣW_{\Sigma} in a fixed basis of Sym(p)\mathrm{Sym}(p). We factor

GΣ=RΣRΣG_{\Sigma}=R_{\Sigma}^{\top}R_{\Sigma} (61)

by Cholesky decomposition and represent a fiber coordinate vector uu in the rescaled frame as

u~=RΣu.\tilde{u}=R_{\Sigma}u. (62)

In this frame, the rescaled Levi-Civita transport is

P~xixjLC=RxjPxixjLCRxi1.\tilde{P}^{LC}_{x_{i}\to x_{j}}=R_{x_{j}}P^{LC}_{x_{i}\to x_{j}}R_{x_{i}}^{-1}. (63)

By Wasserstein-unitarity, it satisfies

(P~xixjLC)P~xixjLC=Im.\bigl(\tilde{P}^{LC}_{x_{i}\to x_{j}}\bigr)^{\top}\tilde{P}^{LC}_{x_{i}\to x_{j}}=I_{m}. (64)

Hence,

P~xixjLCO(m).\tilde{P}^{LC}_{x_{i}\to x_{j}}\in O(m). (65)

This is the coordinate system used in the transport-recovery experiments. In these coordinates, the free-O(m)O(m) Householder class described in Appendix D contains the ground-truth transports. The spectral-stability metrics are computed from the assembled sheaf Laplacian Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} and are invariant to this coordinate choice up to similarity transformation.

F.1.4 Sample construction

We draw samples

Σi=RiDiRi,i=1,,n,\Sigma_{i}=R_{i}D_{i}R_{i}^{\top},\qquad i=1,\dots,n, (66)

where RiR_{i} is a Haar-random p×pp\times p orthogonal matrix, obtained by QR decomposition of a standard-normal matrix, and DiD_{i} is diagonal with log-uniform spectrum on [log0.5,log2.0][\log 0.5,\log 2.0]. Equivalently, the eigenvalues of Σi\Sigma_{i} lie in [0.5,2.0][0.5,2.0] on a log-uniform scale.

Although Sym++(p)\mathrm{Sym}^{++}(p) is non-compact, this procedure samples a bounded subset of it. This is appropriate for the finite deployment-regime stability tests reported here, but it is not the normalized-volume sampling assumption used in the asymptotic convergence theorem.

We build a kkNN graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E) with k=8k=8 under the Gaussian Wasserstein distance W2(Σi,Σj)W_{2}(\Sigma_{i},\Sigma_{j}) and assign Gaussian-kernel weights

kijt=exp(W2(Σi,Σj)24t),t=0.5.k_{ij}^{t}=\exp\left(-\frac{W_{2}(\Sigma_{i},\Sigma_{j})^{2}}{4t}\right),\qquad t=0.5. (67)

The induced network sheaf n,dt\mathcal{F}_{n,d}^{t} has per-edge restriction maps

(n,dt)xieij=kijtPximijLC,(\mathcal{F}_{n,d}^{t})_{x_{i}\leq e_{ij}}=\sqrt{k_{ij}^{t}}\,P^{LC}_{x_{i}\to m_{ij}}, (68)

as in Def, 3, where mijm_{ij} is the Wasserstein-geodesic midpoint between Σi\Sigma_{i} and Σj\Sigma_{j}.

F.1.5 Spectral stability under sampling density increase

For each Gaussian dimension pp, sample size n{50,100,200,400,800}n\in\{50,100,200,400,800\}, and random seed, we sample 𝒳nSym++(p)\mathcal{X}_{n}\subset\mathrm{Sym}^{++}(p), build n,dt\mathcal{F}_{n,d}^{t} using the closed-form Levi-Civita transports, and assemble the sheaf Laplacian

Δn,dtnd×nd\Delta_{\mathcal{F}_{n,d}^{t}}\in\mathbb{R}^{nd\times nd} (69)

as a sparse block matrix. Since the sheaf Laplacian has size nmaxm×nmaxmn_{\max}m\times n_{\max}m and we require a dense eigendecomposition, we choose nmaxn_{\max} so that nmaxd104n_{\max}\cdot d\approx 10^{4} remains tractable on a single CPU node, yielding nmax{4000,2000,1000}n_{\max}\in\{4000,2000,1000\} for p{2,3,4}p\in\{2,3,4\} (i.e., d{3,6,10}d\in\{3,6,10\}), respectively.

Let

λ1(n)λk(n)\lambda_{1}^{(n)}\leq\dots\leq\lambda_{k}^{(n)} (70)

denote the bottom-kk eigenvalues of Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}}, with k=32k=32, and define λi(nmax)\lambda_{i}^{(n_{\max})} analogously for the reference operator Δnmax,mt\Delta_{\mathcal{F}_{n_{\max},m}^{t}}. We measure both the aggregate 2\ell_{2} and worst-case relative spectral discrepancy of the bottom-3232 eigenvalues of Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} against a high-resolution reference, sweeping p{2,3,4}p\in\{2,3,4\} and n{50,100,200,400,800}n\in\{50,100,200,400,800\}, and averaging over 5 sampling realizations. Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.

Refer to caption
Figure 3: Spectral stability of Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} for different signal and manifold sampling densitites. Left: aggregate 2\ell_{2} eigenvalues discrepancy. Right: worst-case relative error.

Moreover,

Aggregate discrepancy.

Fig. 3 (Left) reports the low-frequency spectral 2\ell_{2} discrepancy

spec-L2(n)=1k(i=1k(λi(n)λi(nmax))2)1/2.\mathrm{spec\text{-}L_{2}}(n)=\frac{1}{k}\left(\sum_{i=1}^{k}\bigl(\lambda_{i}^{(n)}-\lambda_{i}^{(n_{\max})}\bigr)^{2}\right)^{1/2}. (71)

We average across three seeds and report x¯±s\bar{x}\pm s as the shaded band.

Worst-case discrepancy.

Fig. 3 (Right) reports the relative max error over the bottom-kk eigenvalues:

spec-rel-max(n)=max1ik|λi(n)λi(nmax)|λk(nmax).\mathrm{spec\text{-}rel\text{-}max}(n)=\frac{\max_{1\leq i\leq k}\left|\lambda_{i}^{(n)}-\lambda_{i}^{(n_{\max})}\right|}{\lambda_{k}^{(n_{\max})}}. (72)

This complements the aggregate metric by capturing worst-case low-frequency spectral error.

Overall, Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.

We train three transport parametrizations from Appendix D, free O(d)O(d) (Householder), circulant, and frozen identity, to recover the Levi-Civita transports in Cholesky-rescaled coordinates by minimizing the per-edge transport-MSE loss

=1|E|eijE𝔼V𝒩(0,Id)PjieijVP~xjxiLCV22.\mathcal{L}=\frac{1}{|E|}\sum_{e_{ij}\in E}\mathbb{E}_{V\sim\mathcal{N}(0,I_{d})}\left\|P_{j\to i}^{e_{ij}}V-\tilde{P}^{LC}_{x_{j}\to x_{i}}V\right\|_{2}^{2}. (73)

Here PjieijO(d)P_{j\to i}^{e_{ij}}\in O(d) is the rescaled transport produced by the model for edge eije_{ij}, P~xjxiLC\tilde{P}^{LC}_{x_{j}\to x_{i}} is the ground-truth Levi-Civita transport from xjx_{j} to xix_{i} in Cholesky-rescaled coordinates, and VV is a fresh isotropic Gaussian test vector.

By the trace identity

𝔼V𝒩(0,Id)AV22=AF2,\mathbb{E}_{V\sim\mathcal{N}(0,I_{d})}\|AV\|_{2}^{2}=\|A\|_{F}^{2}, (74)

the population loss equals the mean per-edge squared Frobenius distance

=1|E|eijEPjieijP~xjxiLCF2.\mathcal{L}^{*}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|P_{j\to i}^{e_{ij}}-\tilde{P}^{LC}_{x_{j}\to x_{i}}\right\|_{F}^{2}. (75)
Free O(d)O(d).

The free transport is parameterized by products of Householder reflections, as detailed in Appendix D. Since P~xjxiLCO(d)\tilde{P}^{LC}_{x_{j}\to x_{i}}\in O(d) in the Cholesky-rescaled frame, the free-O(d)O(d) class contains the target transport and the minimum population loss is zero. Empirically, we observe 1.6107\mathcal{L}^{*}\approx 1.6\cdot 10^{-7} at convergence, confirming recovery to numerical precision.

Restricted classes and analytical projections.

For each restricted transport hypothesis class 𝒞O(d)\mathcal{C}\subseteq O(d), the population loss minimum is the mean Frobenius distance from each ground-truth transport P~xjxiLC\tilde{P}^{LC}_{x_{j}\to x_{i}} to its best approximation inside 𝒞\mathcal{C}:

𝒞=1|E|eijEminT𝒞TP~xjxiLCF2=1|E|eijEP~xjxiLCproj𝒞(P~xjxiLC)F2.\mathcal{L}^{*}_{\mathcal{C}}=\frac{1}{|E|}\sum_{e_{ij}\in E}\min_{T\in\mathcal{C}}\left\|T-\tilde{P}^{LC}_{x_{j}\to x_{i}}\right\|_{F}^{2}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|\tilde{P}^{LC}_{x_{j}\to x_{i}}-\mathrm{proj}_{\mathcal{C}}\bigl(\tilde{P}^{LC}_{x_{j}\to x_{i}}\bigr)\right\|_{F}^{2}. (76)

This is the Theory column in Table 1.

Frozen identity. For 𝒞id={Id}\mathcal{C}_{\mathrm{id}}=\{I_{d}\}, the projection is trivial: proj{I}(P~xjxiLC)=Id\mathrm{proj}_{\{I\}}(\tilde{P}^{LC}_{x_{j}\to x_{i}})=I_{d}, so

frozen=1|E|eijEP~xjxiLCIdF2.\mathcal{L}^{*}_{\mathrm{frozen}}=\frac{1}{|E|}\sum_{e_{ij}\in E}\left\|\tilde{P}^{LC}_{x_{j}\to x_{i}}-I_{d}\right\|_{F}^{2}. (77)

Circulant. For the circulant class (Appendix D with d=md=m), let FdF_{d} be the d×dd\times d DFT matrix and define

𝒞circ={Fddiag(eiφ1,,eiφd)Fm:φk}.\mathcal{C}_{\mathrm{circ}}=\left\{F_{d}^{*}\mathrm{diag}(e^{i\varphi_{1}},\dots,e^{i\varphi_{d}})F_{m}:\varphi_{k}\in\mathbb{R}\right\}. (78)

The Frobenius-best projection of any TO(d)T\in O(d) onto this class is the diagonal-phase Procrustes solution

projcirc(T)=Fmdiag(eiφ^1,,eiφ^d)Fd,φ^k=arg((FdTFd)kk).\mathrm{proj}_{\mathrm{circ}}(T)=F_{m}^{*}\mathrm{diag}(e^{i\hat{\varphi}_{1}},\dots,e^{i\hat{\varphi}_{d}})F_{d},\qquad\hat{\varphi}_{k}=\arg\bigl((F_{d}TF_{d}^{*})_{kk}\bigr). (79)

The zero-frequency phase is pinned to zero to preserve the constant mode, and the self-conjugate Nyquist frequency, when present, is handled according to the real-valued convention of Appendix D. In the synthetic experiment the circulant class is used as a structured subgroup of O(d)O(d) for testing transport-class projection, rather than as a time-series prior.

Empirical vs. theoretical plateaus.

In Table 1, the Empirical column is the lowest training loss observed over the 50005000-epoch budget, while the Theory column is 𝒞\mathcal{L}^{*}_{\mathcal{C}} computed on the same edge set used during training. Both columns report means ±\pm standard deviations across three seeds. The empirical and theoretical plateaus track each other to within 3%3\% for the circulant class and within 2.3%2.3\% for frozen identity, with closer agreement (within 1%1\%) at large nn. This confirms that restricting the transport class to 𝒞O(d)\mathcal{C}\subseteq O(d) constrains the per-edge restriction maps of Δn,dt\Delta_{\mathcal{F}_{n,d}^{t}} in a quantitatively predictable way. We do not claim that such an arbitrary edgewise subgroup constraint automatically lifts to a smooth global connection class on the continuous bundle.

F.1.6 Hyperparameters

See Table 4.

Spectral stability Transport recovery
Gaussian dimension pp {2,3,4}\{2,3,4\} (d{3,6,10}d\in\{3,6,10\}) 44 (d=10d{=}10)
sample-size grid nn {50,100,200,400,800}\{50,100,200,400,800\} {16,32,64,128,256}\{16,32,64,128,256\}
reference nmaxn_{\max} {4000,2000,1000}\{4000,2000,1000\} at p{2,3,4}p\in\{2,3,4\}
graph kNN, k=8k{=}8, W2W_{2} distance kNN, k=8k{=}8, W2W_{2} distance
top-kk eigenvalues 3232
seeds 33 33
epochs / patience 50005000 / 600600
optimizer Adam, lr 51035\!\cdot\!10^{-3}, batch 256256
Householder reflections R=16R{=}16
Euler steps for PLCP^{LC} 5050 5050
Table 4: Synthetic experiments: hyperparameters for spectral stability and transport recovery.

F.2 Traffic forecasting: experimental details

F.2.1 Datasets

We use two standard traffic-speed benchmarks from [69].

METR-LA. |𝒳n|=207|\mathcal{X}_{n}|=207 loop-detector sensors on the Los Angeles highway network, recording average traffic speed at 55-minute intervals. The spatial graph follows the DCRNN convention: edge weights Wij=exp(d(xi,xj)2/σ2)W_{ij}=\exp(-d_{\mathcal{M}}(x_{i},x_{j})^{2}/\sigma^{2}) with σ\sigma set to the standard deviation of the pairwise road-network distances are thresholded to retain WijκW_{ij}\geq\kappa (with κ=0.1\kappa=0.1) and symmetrized via Wmax(W,W)W\leftarrow\max(W,W^{\top}).

PEMS-BAY. |𝒳n|=325|\mathcal{X}_{n}|=325 sensors in the San Francisco Bay Area, with the same temporal resolution and graph construction.

For both datasets, we use T=12T=12 observed time steps as input and forecast at horizons h{3,6,12}h\in\{3,6,12\}. Train/validation/test splits follow the standard 70/10/2070/10/20 chronological partition of [69].

F.2.2 Task formulation

At each forecasting instance, the input signal is

𝐬n,T=(𝐬x1,,𝐬xn)C0(n,Tt;Gn),𝐬xiT,\mathbf{s}_{n,T}=({\mathbf{s}}_{x_{1}},\dots,{\mathbf{s}}_{x_{n}})\in C^{0}(\mathcal{F}_{n,T}^{t};G_{n}),\qquad{\mathbf{s}}_{x_{i}}\in\mathbb{R}^{T}, (80)

where n=|𝒳n|n=|\mathcal{X}_{n}| and d=Td=T, so the network sheaf n,Tt\mathcal{F}_{n,T}^{t} has TT-dimensional stalks and Laplacian Δn,TtnT×nT\Delta_{\mathcal{F}_{n,T}^{t}}\in\mathbb{R}^{nT\times nT}. The goal is to predict future speed vectors 𝐲(h)|𝒳n|\mathbf{y}^{(h)}\in\mathbb{R}^{|\mathcal{X}_{n}|} at each horizon hh. The prediction loss is the mean absolute forecasting error

pred=1|𝒟|(𝐬n,T,𝐲)𝒟h{3,6,12}𝐲^(h)(𝐬n,T)𝐲(h)1.\mathcal{L}_{\mathrm{pred}}=\frac{1}{|\mathcal{D}|}\sum_{(\mathbf{s}_{n,T},\mathbf{y})\in\mathcal{D}}\sum_{h\in\{3,6,12\}}\left\|\widehat{\mathbf{y}}^{(h)}(\mathbf{s}_{n,T})-\mathbf{y}^{(h)}\right\|_{1}. (81)

For HilbNet variants with learned transports, we add the kernel regularizer of Appendix D with weight λ\lambda, giving the full training objective =pred+λkernelreg\mathcal{L}=\mathcal{L}_{\mathrm{pred}}+\lambda\,\mathcal{L}_{\mathrm{kernel-reg}}.

F.2.3 Model variants and baselines

All HilbNet variants are discretized HilbNets (Def, 5) with polynomial filters of order KK and the same per-node linear readout (DCRNN convention) mapping sheaf-filtered features to per-node horizon predictions. The only architectural difference is the admissible class of edgewise transports Pjieij𝒞O(T)P_{j\to i}^{e_{ij}}\in\mathcal{C}\subseteq O(T), as described in Appendix D with d=Td=T. The transport hypothesis classes are the same of the previous section, briefly summarized and contextualized below.

Frozen identity. 𝒞id={IT}\mathcal{C}_{\mathrm{id}}=\{I_{T}\}. Neighboring sensors exchange time windows without temporal alignment. The sheaf Laplacian reduces to a standard weighted graph Laplacian applied independently to each temporal coordinate, recovering a graph convolutional network.

Circulant. Each transport is a real orthogonal circulant matrix parameterized by (T1)/2\lfloor(T{-}1)/2\rfloor frequency-wise phases per edge. This encodes a time-stationary prior: the transport can advance or delay oscillatory components across an edge but cannot arbitrarily mix temporal coordinates. This is a natural inductive bias for traffic data, where congestion patterns propagate through the road network with local delays and phase shifts.

Free O(T)O(T). Each transport is parameterized by a product of R=8R{=}8 Householder reflections, yielding an orthogonal matrix in a strict Householder-defined subset of O(T)O(T) (see Appendix D). This is the most expressive transport class but uses 10×\sim\!10{\times} more parameters in total than the circulant variant (e.g., 119,036119{,}036 vs 11,65611{,}656 on METR-LA; per edge the ratio is 20×\sim\!20{\times}, since circulant uses only (T1)/2=5\lfloor(T{-}1)/2\rfloor=5 phases per edge at T=12T{=}12).

We also compare against two non-transport baselines. A fiber-only MLP processes each sensor’s time window independently, ignoring graph structure. A spatiotemporal graph baseline applies standard graph convolution to the temporally-augmented node features but does not learn sheaf transports. Finally, we include two external baselines from the literature: FC-LSTM [69] and STAEformer [71], reported as in the cited papers.

F.2.4 Detailed results

Table 2 reports MAE, RMSE, and MAPE at horizons 33, 66, and 1212 (mean ±\pm std over five seeds for our experiments).

Value of graph structure.

On both datasets, the frozen-identity HilbNet improves over the fiber-only MLP baseline, confirming that sheaf diffusion over the spatial graph is beneficial even without non-trivial transports.

Value of learned transports.

Both the free and circulant variants consistently outperform frozen identity at all horizons on both datasets. This confirms that the sheaf structure helpw beyond the usual graph structure.

Free vs. circulant.

On METR-LA, the free-O(T)O(T) model achieves the best absolute accuracy at all horizons (e.g., MAE 3.9383.938 vs. 4.0594.059 for circulant at horizon 1212), as expected from its larger hypothesis class. On PEMS-BAY, the two variants are nearly tied: free wins MAE and MAPE at horizon 1212 by 0.03\sim\!0.03 mph (a 224σ4\sigma effect over five seeds), circulant wins MAE/MAPE at horizon 33, and RMSE is statistically indistinguishable at horizons 66 and 1212. In both cases, the circulant model uses roughly one tenth of the transport parameters (e.g., 11,65611{,}656 vs. 119,036119{,}036 on METR-LA), making it the most parameter-efficient HilbNet variant. This supports the central pragmatic-transport message: a structured transport class encoding a physically motivated alignment prior can recover most of the benefit of unconstrained learned transports with substantially fewer degrees of freedom.

Comparison with external baselines.

Our HilbNet variants are lightweight models designed to test the value of Hilbert-sheaf structure, not to compete with large-scale spatiotemporal transformers. The external baselines (FC-LSTM, STAEformer) are included for reference and use substantially more parameters and architectural components. Nevertheless, the circulant and free HilbNets outperform FC-LSTM at all horizons on both datasets while using far fewer parameters.

F.2.5 Hyperparameters

See Table 5. All experiments are run on a single H200 GPU. Hyperparameters are chosen with a sweep. All presented variants are computed using the same codebase, and we made sure they differ only in their transport parametrization.

METR-LA PEMS-BAY
sensors |𝒳n||\mathcal{X}_{n}| 207207 325325
input window TT 1212 1212
horizons hh {3,6,12}\{3,6,12\} {3,6,12}\{3,6,12\}
input feature dim 22 (speed + time-of-day) 22 (speed + time-of-day)
graph thresh. kernel exp(d2/σ2)\exp(-d^{2}/\sigma^{2}), κ=0.1\kappa{=}0.1 thresh. kernel exp(d2/σ2)\exp(-d^{2}/\sigma^{2}), κ=0.1\kappa{=}0.1
sheaf-conv layers L=2L{=}2, channel widths [2,16,32][2,16,32] L=2L{=}2, channel widths [2,16,32][2,16,32]
polynomial order KK per layer K=[2,2]K=[2,2] (HilbNet), K=[1,1]K=[1,1] (MLP fiber baseline) K=[2,2]K=[2,2] (HilbNet), K=[1,1]K=[1,1] (MLP fiber baseline)
Householder reflections 88 88
readout per-node linear (TinFlastToutT_{\mathrm{in}}\!\cdot\!F_{\mathrm{last}}\to T_{\mathrm{out}}) per-node linear (TinFlastToutT_{\mathrm{in}}\!\cdot\!F_{\mathrm{last}}\to T_{\mathrm{out}})
epochs / patience 150150 / 2020 150150 / 2020
batch size 3232 3232
optimizer Adam, β=(0.9,0.999)\beta{=}(0.9,0.999) Adam, β=(0.9,0.999)\beta{=}(0.9,0.999)
learning rate 51035\!\cdot\!10^{-3} (HilbNet); 11031\!\cdot\!10^{-3} (STGNN baseline) 51035\!\cdot\!10^{-3} (HilbNet); 11031\!\cdot\!10^{-3} (STGNN baseline)
LR schedule cosine annealing, ηmin=106\eta_{\min}{=}10^{-6} cosine annealing, ηmin=106\eta_{\min}{=}10^{-6}
weight decay / dropout 0 / 0 0 / 0
gradient clipping g25\|g\|_{2}{\leq}5 (DCRNN convention) g25\|g\|_{2}{\leq}5 (DCRNN convention)
kernel regularizer λ\lambda 0.010.01 0.010.01
seeds 55 55
Table 5: Traffic forecasting hyperparameters.

Appendix G Mathematical Background

G.1 Hilbert Bundles

In this section, we provide relevant background on the theory of Hilbert bundles. In particular, we define the notions of Banach and Hilbert manifolds, as well as introduce the appropriate notions of connection, parallel transport, and heat flow for bundles in this setting.

G.1.1 Banach and Hilbert manifolds

To study heat kernels for smooth Hilbert bundles, we must examine manifolds modeled on generic Banach spaces. We will assume all Banach spaces and Hilbert spaces are defined over the field of real numbers \mathbb{R}, unless otherwise stated.

Definition 7.

A second-countable topological space \mathcal{M} is a topological Banach manifold if there is a Banach space 𝒱\mathcal{V} and an atlas {(Ui,ϕi:Ui𝒱)}iI\{(U_{i}\>,\>\phi_{i}:U_{i}\rightarrow\mathcal{V})\}_{i\in I} such that the following conditions hold:

  1. 1.

    each UiU_{i} is an open subset of \mathcal{M};

  2. 2.

    each ϕi:Ui𝒱\phi_{i}:U_{i}\rightarrow\mathcal{V} is a homeomorphism onto an open subset of 𝒱\mathcal{V};

  3. 3.

    for all i,ji,j, ϕi(UiUj)\phi_{i}(U_{i}\cap U_{j}) is an open subset of 𝒱\mathcal{V};

  4. 4.

    the transition map ϕjϕi1:ϕi(UiUj)ϕj(UiUj)\phi_{j}\phi_{i}^{-1}:\phi_{i}(U_{i}\cap U_{j})\rightarrow\phi_{j}(U_{i}\cap U_{j}) is a homeomorphism.

When the Banach space 𝒱\mathcal{V} is specified, we say that \mathcal{M} is an 𝒱\mathcal{V}-manifold, or a Banach manifold modeled on 𝒱\mathcal{V}. If each map ϕi\phi_{i} and transition map ϕjϕi1\phi_{j}\phi_{i}^{-1} is kk-times Fréchet differentiable, we say that \mathcal{M} is a CkC^{k}-Banach manifold. If these maps are smooth i.e. CC^{\infty}, we say that \mathcal{M} is a smooth Banach manifold.

Definition 8.

A topological (resp. CkC^{k} / smooth) Banach manifold \mathcal{M} is a topological (resp. CkC^{k} / smooth) Hilbert manifold if it can be modeled on a Banach space 𝒱\mathcal{V} which admits the structure of a Hilbert space.

Remark 1.

We make a few observations about this definition.

  1. 1.

    Since every nn-dimensional real Banach space is isomorphic to n\mathbb{R}^{n}, a finite dimensional Banach manifold is exactly a real manifold in the usual sense.

  2. 2.

    Like ordinary manifolds, we require Banach manifolds to be second countable, and hence to have a countable dense subset. It follows that if \mathcal{M} is a manifold modeled on a Banach space 𝒱\mathcal{V}, then 𝒱\mathcal{V} itself must be separable. This condition could be be removed, but it will generally make our lives easier.

  3. 3.

    The definition of a Hilbert manifold does not directly require the transition maps τij:=ϕjϕi1:𝒱𝒱\tau_{ij}:=\phi_{j}\circ\phi_{i}^{-1}:\mathcal{V}\to\mathcal{V} to respect inner product structure on the modeling Hilbert space 𝒱\mathcal{V}. Hence, it is often better to think of a Hilbert manifold as a special case of a Banach manifold, instead of as a manifold that respects the Hilbert space structure per se.

The usual differential geometric constructions on manifolds extend naturally to Hilbert and Banach manifolds. For example, tangent spaces generalize naturally. Given a CkC^{k}-Banach manifold \mathcal{M} with k1k\geq 1, for each xx\in\mathcal{M}, one may form the tangent space TxT_{x}\mathcal{M} at a point xx\in\mathcal{M} as equivalence classes of triples (U,ϕ,v)(U,\phi,v) of a chart ϕ:U𝒱\phi:U\to\mathcal{V} and a vector v𝒱v\in\mathcal{V}, under the relation:

(U1,ϕ,v)(U2,ψ,w)(Dϕ(x)(ψϕ)1)(v)=w.(U_{1},\phi,v)\sim(U_{2},\psi,w)\iff(D_{\phi(x)}(\psi\phi)^{-1})(v)=w.

Such equivalence classes are easily seen to form a real vector space isomorphic to 𝒱\mathcal{V}.

G.1.2 Smooth bundles

Definition 9 (Smooth Banach and Hilbert bundles).

Let \mathcal{M} be a smooth finite–dimensional manifold and let 𝒱\mathcal{V} be a fixed separable Banach space. A smooth Banach bundle with model space 𝒱\mathcal{V} consists of a smooth Banach manifold \mathcal{E} equipped with a smooth surjective submersion

π:,\pi\colon\mathcal{E}\longrightarrow\mathcal{M}\,,

that satisfies the following conditions.

  1. 1.

    Local triviality. For every pp\in\mathcal{M} there exists an open neighborhood UU\subset\mathcal{M} and a diffeomorphism

    ϕU:π1(U)U×𝒱\phi_{U}\colon\pi^{-1}(U)\;\xrightarrow{\;\cong\;}\;U\times\mathcal{V}

    satisfying π=proj1ϕU\pi=\text{proj}_{1}\!\circ\phi_{U}, where proj1:U×𝒱U\text{proj}_{1}:U\times\mathcal{V}\to U is the canonical projection, and such that, for each qUq\in U, the restriction ϕU|q:q{q}×𝒱\phi_{U}|_{\mathcal{E}_{q}}\colon\mathcal{E}_{q}\to\{q\}\times\mathcal{V} is a bounded linear isomorphism. We call the pair (U,ϕU)(U,\phi_{U}) a trivializing chart.

  2. 2.

    Smooth transition functions. Whenever (U,ϕU)(U,\phi_{U}) and (V,ϕV)(V,\phi_{V}) are trivializing charts, the transition map

    τUV(q,):=ϕVϕU1(q,):𝒱𝒱,qUV,\tau_{UV}(q,-)\;:=\;\phi_{V}\circ\phi_{U}^{-1}\big(q,-\big)\;\colon\;\mathcal{V}\longrightarrow\mathcal{V},\qquad q\in U\cap V,

    is a bounded isomorphism and depends smoothly on qq; that is, τUV:UVGL(𝒱)\tau_{UV}\colon U\cap V\to\mathrm{GL}(\mathcal{V}) is a smooth map, where GL(𝒱)\mathrm{GL}(\mathcal{V}) denotes the Banach–Lie group of bounded invertible operators on 𝒱\mathcal{V} with the operator‐norm topology.

  3. 3.

    Smooth norm. There is a smooth map N:N:\mathcal{E}\to\mathbb{R} such that the trivializing charts (U,ϕU)(U,\phi_{U}) can be chosen with the additional property that for each xpx\in\mathcal{E}_{p},

    N(x)=ϕU|p(x)𝒱.N(x)=\left\|\phi_{U}\big\rvert_{\mathcal{E}_{p}}(x)\right\|_{\mathcal{V}}\,.
  4. 4.

    Smooth fiberwise operations. The fiberwise addition and scalar‐multiplication maps

    +:×,:×,+\;\colon\;\mathcal{E}\times_{\mathcal{M}}\mathcal{E}\longrightarrow\mathcal{E},\qquad\cdot\;\colon\;\mathbb{R}\times\mathcal{E}\longrightarrow\mathcal{E},

    are smooth Banach‐manifold maps.

When the Banach space 𝒱\mathcal{V} is a separable Hilbert space, we say π:B\pi:\mathcal{E}\to B is a Hilbert bundle. We denote the inner product on the fiber p\mathcal{E}_{p} by ,p\langle-,-\rangle_{p}.

Remark 2.

We make a few remarks about the definition of a Hilbert bundle above.

  1. 1.

    For convenience, we restrict our attention to Hilbert bundles over closed finite-dimensional manifolds with separable fibers. None of these restrictions are essential for the general theory of Banach and Hilbert bundles. However, these restrictions are necessary for our approach to constructing heat kernels in this setting.

  2. 2.

    The intuitive idea is the following: a smooth Banach bundle is a smooth vector bundle where the fibers are allowed to be infinite-dimensional and come equipped with a complete norm. The smooth norm condition enforces that the Banach space fibers are stitched together in such a way that the fiber-wise norm varies smoothly. In the case of a Hilbert bundle, the smooth norm condition also enforces that the fiber-wise inner product ,p\langle-,-\rangle_{p} varies smoothly.

  3. 3.

    In light of the previous remarks, the definition presented here is not minimal. We make the choice to include redundant information in our definition for clarity, with the understanding that some conditions are superfluous [75].We also make the choice to include the smooth norm condition, often called a smooth orthogonal/Hermitian metric, in the definition of the bundle itself.

  4. 4.

    In the case of a finite dimensional model space 𝒱\mathcal{V}, this definition recovers the usual smooth vector bundle, with the additional data of a smooth orthogonal/Hermitian metric.

  5. 5.

    Suppose π:\pi:\mathcal{E}\to\mathcal{M} is a smooth Hilbert bundle modeled on a Hilbert space \mathcal{H}. While the transition maps τUV\tau_{UV} must respect the topological structure of the Hilbert space \mathcal{H}, it need not respect the inner product structure. When each transition map τUV\tau_{UV} is a unitary isomorphism, we say the bundle is a smooth unitary Hilbert bundle.

Definition 10 (Smooth sections of a Banach bundle).

Let π:\pi:\mathcal{E}\to\mathcal{M} be a smooth Banach bundle (resp. Hilbert bundle) over a finite–dimensional manifold \mathcal{M} with model Banach space (resp. Hilbert space) BB. A section of \mathcal{E} is a map S:S:\mathcal{M}\to\mathcal{E} such that πS=id\pi\circ S=\mathrm{id}_{\mathcal{M}}. We denote the collection of all smooth sections by Γ():=C(,)\Gamma(\mathcal{E}):=C^{\infty}(\mathcal{M},\mathcal{E}). Note that this is a module over the commutative algebra C()C^{\infty}(\mathcal{M}) with point-wise addition and multiplication. If the section SS is only kk-times continuously differentiable, we write SCk(,)S\in C^{k}(\mathcal{M},\mathcal{E}).

Definition 11 (L2L^{2}-Sections of a Banach bundle).

Suppose that the manifold \mathcal{M} is endowed with a measure μ\mu. We say that SS is an L2L^{2}-section if S2:=(S(x)x2𝑑μ(x))1/2<\|S\|_{2}:=\left(\int_{\mathcal{M}}||S(x)||^{2}_{\mathcal{E}_{x}}\,d\mu(x)\right)^{1/2}<\infty. We may similarly form a space of L2L^{2}-sections, denoted L2(,;μ)L^{2}(\mathcal{M},\mathcal{E};\mu), or simply L2(,)L^{2}(\mathcal{M},\mathcal{E}) when the measure is implied by context. When \mathcal{E} is modeled on a separable Hilbert space \mathcal{H}, the space of L2L^{2}-sections L2(,;μ)L^{2}(\mathcal{M},\mathcal{E};\mu) is a real separable Hilbert space with inner product S,S:=S(x),S(x)x𝑑μ(x)\langle S,S^{\prime}\rangle_{\mathcal{E}}:=\int_{\mathcal{M}}\langle S(x),S^{\prime}(x)\rangle_{\mathcal{E}_{x}}\,d\mu(x).

Remark 3.

In the Riemannian setting, we have a natural candidate for μ\mu via the (pseudo) volume form on \mathcal{M}, or its normalized variant.

G.1.3 Connections

We now introduce connections on smooth Banach and Hilbert bundles. The smooth Banach manifold structure on the bundle \mathcal{E} provides no way to directly compare vectors in different fibers x\mathcal{E}_{x} and y\mathcal{E}_{y} with respect to their Banach space structures. Instead, as in the finite-dimensional case, we use a connection to link the fibers though the geometry of the base manifold \mathcal{M}.

Definition 12.

Let π:\pi:\mathcal{E}\to\mathcal{M} be a smooth Banach bundle over a compact manifold \mathcal{M}, and let TT^{*}\mathcal{M} denote the cotangent bundle on \mathcal{M}. A connection on \mathcal{E} is any of the following three equivalent structures:

  1. 1.

    A connection is an \mathbb{R}-linear map:

    :Γ()Γ(T)\nabla:\Gamma(\mathcal{E})\rightarrow\Gamma(T^{*}\mathcal{M}\otimes\mathcal{E})

    such that the product rule:

    (fS)=DfS+fS\nabla(fS)=Df\otimes S+f\nabla S

    holds for all smooth function f:f:\mathcal{M}\rightarrow\mathbb{R} and smooth sections sΓ()s\in\Gamma(\mathcal{E}).

  2. 2.

    A connection is a map :Γ(T)×Γ()Γ()\nabla:\Gamma(T\mathcal{M})\times\Gamma(\mathcal{E})\to\Gamma(\mathcal{E}) which is C(,)C^{\infty}(\mathcal{M},\mathbb{R})-linear in its vector-field input, and satisfies the Leibniz rule:

    X(fS)=X[f]S+fXS\nabla_{X}(fS)=X[f]S+f\nabla_{X}S

    where XS:=(X,S)\nabla_{X}S:=\nabla(X,S) and X[f](x):=(Dxf)(Xx)X[f](x):=(D_{x}f)(X_{x}) is the directional derivative of ff along the vector field XX.

  3. 3.

    A connection \nabla is the data of an \mathbb{R}-linear map xS:Txx\nabla_{x}S:T_{x}\mathcal{M}\rightarrow\mathcal{E}_{x} for each xx\in\mathcal{M} and sΓ()s\in\Gamma(\mathcal{E}) that satisfies the following conditions:

    1. (a)

      ()S\nabla_{(-)}S depends smoothly on xx

    2. (b)

      x(a1S1+a2S2)=a1xS1+a2xS2\nabla_{x}(a_{1}S_{1}+a_{2}S_{2})=a_{1}\nabla_{x}S_{1}+a_{2}\nabla_{x}S_{2} for all xMx\in M, S1,S2Γ()S_{1},S_{2}\in\Gamma(\mathcal{E}), and a1,a2a_{1},a_{2}\in\mathbb{R};

    3. (c)

      for every smooth f:f:\mathcal{M}\to\mathbb{R} and section SΓ()S\in\Gamma(\mathcal{E}), let fSΓ()fS\in\Gamma(\mathcal{E}) denote the section (fS)(x):=f(x)s(x)(fS)(x):=f(x)s(x). For each xx\in\mathcal{M}, the maps x(fS)\nabla_{x}(fS) and xs\nabla_{x}s are related by

      x(fS)(v)=Dxf(v)S(x)+f(x)(xS)(v)\nabla_{x}(fS)(v)=D_{x}f(v)S(x)+f(x)(\nabla_{x}S)(v)

      for all vTxv\in T_{x}\mathcal{M}.

The following proposition provides a standard representation theorem for a smooth connection \nabla.

Proposition 2.

Let \nabla be a smooth connection on a trivial Hilbert bundle π:×\pi:\mathcal{M}\times\mathcal{H}\to\mathcal{M}. There is a map

A:Γ(×)Γ(T(×))A:\Gamma(\mathcal{M}\times\mathcal{H})\to\Gamma(T^{*}\mathcal{M}\otimes(\mathcal{M}\times\mathcal{H}))

such that for every section SΓ(×)S\in\Gamma(\mathcal{M}\times\mathcal{H}), we have:

S=dS+AS\nabla S=dS+AS

where dd is the Fréchet derivative. Equivalently, for each xx\in\mathcal{M}, and vTxMv\in T_{x}M, there is a bounded linear operator Ax,v:A_{x,v}:\mathcal{H}\to\mathcal{H}, varying smoothly in (x,v)(x,v), such that:

(xS)(v)=(DxS)(v)+(Ax,vS)(x).(\nabla_{x}S)(v)=(D_{x}S)(v)+(A_{x,v}S)(x)\,.

Moreover, the assignment vAx,vv\mapsto A_{x,v} is linear for each xx.

Remark 4.

While this proposition is stated for trivial bundles, it can be applied to any Hilbert bundle through the choice of a trivialization, or through Kuiper’s theorem.

G.1.4 Parallel Transport

Connections allow us to relate the geometries of the fibers over nearby points in \mathcal{M}. For example, we may define parallel transport.

Given a smooth curve γ:[0,1]\gamma:[0,1]\to\mathcal{M}, say that S:[0,1]S:[0,1]\to\mathcal{E} is a section over γ\gamma if πS=γ\pi\circ S=\gamma.

Definition 13.

Let π:\pi:\mathcal{E}\to\mathcal{M} be a smooth Banach bundle with model space XX, let \nabla be a connection on \mathcal{E}, and let γ:[0,1]\gamma:[0,1]\to\mathcal{M} be a smooth path. A map S:[0,1]S:[0,1]\to\mathcal{E} is a section over γ\gamma if πS=γ\pi\circ S=\gamma. A section over γ\gamma is parallel if

γ˙S=0.\nabla_{\dot{\gamma}}S=0.
Proposition 3.

Let γ\gamma be a smooth path in \mathcal{M}. For every vector vγ(0)v\in\mathcal{E}_{\gamma(0)}, there is a unique parallel section SvS_{v} over γ\gamma such that Sv(0)=vS_{v}(0)=v. Moreover, the dependence on vv is smooth and linear.

Proof.

The existence of the parallel section SvS_{v} can be restated as an initial value problem for a linear ordinary differential equation in a Banach space. Standard existence and uniqueness theorems apply. For details, see [62]. ∎

By the existence and uniqueness of parallel sections, we may define corresponding parallel transport maps. Given a path γ:[0,1]\gamma:[0,1]\to\mathcal{M}, there is an induced parallel transport operator Pγ:γ(0)γ(1)P_{\gamma}^{\nabla}:\mathcal{E}_{\gamma(0)}\to\mathcal{E}_{\gamma(1)} defined by

Pγ(v):=Sv(1).P^{\nabla}_{\gamma}(v):=S_{v}(1).

It is straightforward to see that PγP^{\nabla}_{\gamma} is a linear bijection, with inverse given by (Pγ)1=Pγrev(P^{\nabla}_{\gamma})^{-1}=P^{\nabla}_{\gamma^{\text{rev}}}, where γrev\gamma^{\text{rev}} is the path obtained by reversing γ\gamma. By the closed graph theorem, it follows that PγP^{\nabla}_{\gamma} is a bounded linear isomorphism.

Definition 14.

Let π:\pi:\mathcal{E}\rightarrow\mathcal{M} be a Hilbert bundle equipped with a connection \nabla. We say the connection \nabla is compatible with the Hilbert bundle structure if it satisfies:

X[S0(x),S1(x)x]=XS0(x),S1(x)x+S0(x),XS1(x)xX[\langle S_{0}(x),S_{1}(x)\rangle_{x}]=\langle\nabla_{X}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{X}S_{1}(x)\rangle_{x}

for every smooth vector field XΓ(T)X\in\Gamma(T\mathcal{M}), and sections S0,S1Γ()S_{0},S_{1}\in\Gamma(\mathcal{E}).

Proposition 4.

Let π:\pi:\mathcal{E}\to\mathcal{M} be a Hilbert bundle with connection \nabla. The following are equivalent:

  1. 1.

    \nabla is compatible with the Hilbert bundle structure;

  2. 2.

    Every parallel transport map PγP^{\nabla}_{\gamma} is unitary;

Proof.

First suppose that \nabla is compatible with the Hilbert bundle structure, and γ\gamma a smooth path in \mathcal{M}. Let u,vγ(0)u,v\in\mathcal{E}_{\gamma(0)}. By compatibility, we may check that:

ddtSu(t),Sv(t)γ(t)\displaystyle\frac{d}{dt}\big\langle S_{u}(t),S_{v}(t)\big\rangle_{\gamma(t)} =γ˙Su(t),Sv(t)γ(t)+Su(t),γ˙Sv(t)γ(t)=0.\displaystyle=\big\langle\nabla_{\dot{\gamma}}S_{u}(t),S_{v}(t)\big\rangle_{\gamma(t)}+\big\langle S_{u}(t),\nabla_{\dot{\gamma}}S_{v}(t)\big\rangle_{\gamma(t)}=0\,.

It immediately follows that PγP^{\nabla}_{\gamma} is unitary.

Conversely, suppose that every parallel transport map PγP^{\nabla}_{\gamma} is unitary. Let XX be a smooth vector field, and S0,S1S_{0},S_{1} smooth sections of \mathcal{E}. Let xx\in\mathcal{M}, and let γ\gamma be a smooth path such that γ(0)=x\gamma(0)=x. For j{0,1}j\in\{0,1\} let uju_{j} be a parallel section over γ\gamma such that uj(0):=Sj(x)u_{j}(0):=S_{j}(x). For j{0,1}j\in\{0,1\}, let wj(t):=Sj(γ(t))uj(t)w_{j}(t):=S_{j}(\gamma(t))-u_{j}(t). Finally, let wj(t):=Sj(γ(t))uj(t)w_{j}(t):=S_{j}(\gamma(t))-u_{j}(t). Since uju_{j} is parallel over γ\gamma, we have that γ˙Sj(γ(t))=γ˙wj(t).\nabla_{\dot{\gamma}}S_{j}(\gamma(t))=\nabla_{\dot{\gamma}}w_{j}(t). Moreover, wj(0)=0w_{j}(0)=0. We may use these facts to compute:

X[S0(x),S1(x)x]\displaystyle X[\big\langle S_{0}(x),S_{1}(x)\big\rangle_{x}] =ddt|t=0S0(γ(t)),S1(γ(t))γ(t)\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\big\langle S_{0}(\gamma(t)),S_{1}(\gamma(t))\big\rangle_{\gamma(t)}
=ddt|t=0u0(t)+w0(t),u1(t)+w1(t)γ(t)\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\big\langle u_{0}(t)+w_{0}(t),u_{1}(t)+w_{1}(t)\big\rangle_{\gamma(t)}
=ddt|t=0u0(t),u1(t)γ(t)+γ˙S0(x),S1(x)x+S0(x),γ˙S1(x)x\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\langle u_{0}(t),u_{1}(t)\rangle_{\gamma(t)}+\langle\nabla_{\dot{\gamma}}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{\dot{\gamma}}S_{1}(x)\rangle_{x}

Since parallel transport maps are unitary, it follows that the quantity u0(t),u1(t)γ(t)\big\langle u_{0}(t),u_{1}(t)\big\rangle_{\gamma(t)} is constant in tt. Hence

X[S0(x),S1(x)x]\displaystyle X[\big\langle S_{0}(x),S_{1}(x)\big\rangle_{x}] =ddt|t=0u0(t),u1(t)γ(t)+γ˙S0(x),S1(x)x+S0(x),γ˙S1(x)x\displaystyle=\frac{d}{dt}\bigg\rvert_{t=0}\langle u_{0}(t),u_{1}(t)\rangle_{\gamma(t)}+\langle\nabla_{\dot{\gamma}}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{\dot{\gamma}}S_{1}(x)\rangle_{x}
=XS0(x),S1(x)x+S0(x),XS1(x)x\displaystyle=\langle\nabla_{X}S_{0}(x),S_{1}(x)\rangle_{x}+\langle S_{0}(x),\nabla_{X}S_{1}(x)\rangle_{x}

proving that \nabla is a compatible connection. ∎

Remark 5.

We note that by the Proposition 4, as we assumed our parallel transport operators to be unitary in the main text, this may equivalently be understood as a metric-compatibility condition.

G.2 Connection Laplacian

Let π:\pi:\mathcal{E}\to\mathcal{M} be a Hilbert bundle on a closed Riemannian manifold, equipped with a compatible connection :Γ(;)Γ(;T)\nabla:\Gamma(\mathcal{M};\mathcal{E})\to\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E}). Moreover, TT^{*}\mathcal{M}\otimes\mathcal{E} inherits the structure of a Hilbert bundle, with fiber-wise inner products induced from the metric gg and the fiber-wise inner products of \mathcal{E}. Since \nabla is a linear differential operator, it has a formal adjoint :Γ(;T)Γ(;)\nabla^{*}:\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E})\to\Gamma(\mathcal{M};\mathcal{E}), defined implicitly by the formula:

S0(x),S1(x)x𝑑μ(x)=S0(x),S1(x)x𝑑μ(x)\int_{\mathcal{M}}\langle\nabla S_{0}(x),S_{1}(x)\rangle_{x}\,d\mu(x)\>=\>\int_{\mathcal{M}}\langle S_{0}(x),\nabla^{*}S_{1}(x)\rangle_{x}\,d\mu(x)

where S0Γ(;)S_{0}\in\Gamma(\mathcal{M};\mathcal{E}), S1Γ(;T)S_{1}\in\Gamma(\mathcal{M};T^{*}\mathcal{M}\otimes\mathcal{E}), and μ\mu is the pseudo-volume form on \mathcal{M}. Using this adjoint, we may define the connection Laplacian.

Definition 15.

The connection Laplacian is the linear operator:

Δ:=:Γ()Γ().\Delta_{\nabla}:=\nabla^{*}\nabla:\Gamma(\mathcal{E})\to\Gamma(\mathcal{E})\,.
Remark 6.

The connection :Γ()Γ(T)\nabla:\Gamma(\mathcal{E})\to\Gamma(T^{*}\mathcal{M}\otimes\mathcal{E}) can be extended to a closed, densely-defined unbounded operator L2:L2()L2(T)\nabla_{L^{2}}:L^{2}(\mathcal{E})\to L^{2}(T^{*}\mathcal{M}\otimes\mathcal{E}). This extended operator has an adjoint L2\nabla^{*}_{L^{2}} as a Hilbert space operator, from which we may define a composite ΔL2:=L2L2\Delta_{\nabla_{L^{2}}}:=\nabla_{L^{2}}^{*}\nabla_{L^{2}}. The formal adjoint \nabla^{*} and connection Laplacian Δ\Delta_{\nabla} can be found by restricting the domains of L2\nabla_{L^{2}}^{*} and ΔL2\Delta_{\nabla_{L^{2}}} to linear subspaces of smooth sections. From this perspective, it becomes clear that the connection Laplacian Δ\Delta_{\nabla} is well defined for all C2C^{2} sections, as C2C^{2} is contained in the Sobolev space H2H^{2}.

The connection Laplacian also admits a characterization in terms of covariant derivatives. Let X,Y2\nabla^{2}_{X,Y} denote the second covariant derivative with respect to vector fields X,YX,Y.

Lemma 1.

(Connection Laplacian in Coordinates) Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle over a closed Riemannian manifold equipped with a compatible Fréchet connection. As operators,

ΔS=tr2S.\Delta_{\nabla}S=-\mathrm{tr}\nabla^{2}S\,.

Moreover, with respect to a local orthonormal frame {ei}i=1m\{e_{i}\}_{i=1}^{m} that is synchronous at pp\in\mathcal{M}, we have equality:

ΔS(p)=i=1meieiS(p).\Delta_{\nabla}S(p)=-\sum_{i=1}^{m}\nabla_{e_{i}}\nabla_{e_{i}}S(p)\,.
Proof.

We adapt the proof of [79] to the Hilbert bundle setting, using the synchronous frame technique of [76]. By a partition of unity argument, it suffices to show that for every pair of smooth sections S1,S2S_{1},S_{2} supported inside the domain of a local orthonormal frame {ei}i=1m\{e_{i}\}_{i=1}^{m}, we have an equality:

ΔS1(x),S2(x)x𝑑μ(x)=i=1mei,ei2S1(x),S2(x)xdμ(x).\int_{\mathcal{M}}\langle\Delta_{\nabla}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=-\int_{\mathcal{M}}\sum_{i=1}^{m}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)\>.

Using the formal adjoint of \nabla, the integral on the left can be rewritten as

ΔS1(x),S2(x)x𝑑μ(x)=i=1meiS1(x),eiS2(x)x𝑑μ(x).\int_{\mathcal{M}}\langle\Delta_{\nabla}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=\sum_{i=1}^{m}\int_{\mathcal{M}}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\,d\mu(x)\>.

To analyze the right hand side, simply note that ei,ei2S1=eieiS1eieiS1\nabla_{e_{i},e_{i}}^{2}S_{1}=\nabla_{e_{i}}\nabla_{e_{i}}S_{1}-\nabla_{\nabla_{e_{i}}e_{i}}S_{1}, and by compatibility, that:

ei[eiS1(x),S2(x)x]=eieiS1(x),S2(x)x+eiS1(x),eiS2(x)x.e_{i}[\langle\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}]=\langle\nabla_{e_{i}}\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}+\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\>.

Rearranging and summing over ii yields:

iei,ei2S1(x),S2(x)x=\displaystyle\sum_{i}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}= ieiS1(x),eiS2(x)x\displaystyle-\sum_{i}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}
+i(ei[eiS1(x),S2(x)x]eieiS1(x),S2(x)x).\displaystyle+\sum_{i}\left(e_{i}[\langle\nabla_{e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}]-\langle\nabla_{\nabla_{e_{i}}e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\right)\>.

The second sum on the right-hand side may be identified as the divergence of the vector field vS1(x),S2(x)x\langle\nabla_{v}S_{1}(x),S_{2}(x)\rangle_{x}, and hence integrates to zero by Stokes’ theorem. Therefore

iei,ei2S1(x),S2(x)xdμ(x)=ieiS1(x),eiS2(x)x𝑑μ(x)-\int_{\mathcal{M}}\sum_{i}\langle\nabla^{2}_{e_{i},e_{i}}S_{1}(x),S_{2}(x)\rangle_{x}\,d\mu(x)=\sum_{i}\int_{\mathcal{M}}\langle\nabla_{e_{i}}S_{1}(x),\nabla_{e_{i}}S_{2}(x)\rangle_{x}\,d\mu(x)

as well. Therefore Δ(S)=tr2(S)\Delta_{\nabla}(S)=-\mathrm{tr}\nabla^{2}(S). Under the additional hypothesis that {ei}i=1m\{e_{i}\}_{i=1}^{m} is synchronous at pp, we have eiei=0\nabla_{e_{i}}e_{i}=0 for each 1im1\leq i\leq m. At such a point, the trace reduces to tr2S(p)=ieieiS(p)\mathrm{tr}\nabla^{2}S(p)=\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p). ∎

G.3 Heat Flow on a Hilbert Bundle

In this section, we fix a closed finite-dimensional Riemannian manifold (,g)(\mathcal{M},g) with canonical volume pseudo-form μ\mu, a smooth Hilbert bundle π:\pi:\mathcal{E}\to\mathcal{M} with fiber x\mathcal{E}_{x}\cong\mathcal{H}, and a connection \nabla. Our goal in this section is three-fold:

  1. 1.

    Demonstrate that the heat equation with respect to the connection Laplacian Δ\Delta_{\nabla} has a unique solution;

  2. 2.

    Show that the heat-flow admits a Heat Kernel;

  3. 3.

    Provide asymptotic estimates for the heat flow that relate to the geometry of the underlying manifold \mathcal{M}.

Our approach is to adapt the methods of Berline, Getzler, and Vergne [15] for finite-rank bundles to the Hilbert bundle setting. A key subtlety that arises in this generalization is in the definition of tensor-products of bundles. While the algebraic and topological tensor products of finite-rank Hilbert spaces agree, they need not coincide for general Hilbert spaces. This complicates, for instance, the necessary tensor-hom adjunction. In order to keep track of the appropriate tensor, we adopt the following convention. Let π0:0\pi_{0}:\mathcal{E}_{0}\to\mathcal{M} and π1:1\pi_{1}:\mathcal{E}_{1}\to\mathcal{M} be smooth Hilbert bundles over a common manifold \mathcal{M}. One may form the hom-bundle hom(0,1)\hom(\mathcal{E}_{0},\mathcal{E}_{1})\to\mathcal{M}, whose fiber hom(0,1)x\hom(\mathcal{E}_{0},\mathcal{E}_{1})_{x} is the Banach space of bounded linear operators from 0\mathcal{E}_{0} to 1\mathcal{E}_{1}. This is a Banach bundle when hom(0,1)x\hom(\mathcal{E}_{0},\mathcal{E}_{1})_{x} is topologized with the operator norm.

Definition 16.

Let Δ\Delta_{\nabla} be the connection Laplacian for a compatible connection on a smooth Hilbert bundle π:\pi:\mathcal{E}\to\mathcal{M} over a compact orientable manifold \mathcal{M}. A heat kernel for Δ\Delta_{\nabla} is a continuous section K(x,y,t)K^{\nabla}(x,y,t) of the Banach bundle hom(pr2,pr1)\text{hom}\big(\text{pr}_{2}^{*}\mathcal{E},\text{pr}_{1}^{*}\mathcal{E}\big) over ××(0,)\mathcal{M}\times\mathcal{M}\times(0,\infty) that satisfies the following conditions.

  1. 1.

    K(x,y,t)K^{\nabla}(x,y,t) is C1C^{1} with respect to tt, and is C2C^{2} with respect to xx.

  2. 2.

    K(x,y,t)K^{\nabla}(x,y,t) satisfies the heat equation tK(x,y,t)=(Δ)xK(x,y,t)\partial_{t}K^{\nabla}(x,y,t)=-(\Delta_{\nabla})_{x}K^{\nabla}(x,y,t), where (Δ)x(\Delta_{\nabla})_{x} means applying the Laplacian to xx.

  3. 3.

    K(x,y,t)K^{\nabla}(x,y,t) satisfies the boundary condition limt0KtS=S\lim_{t\to 0}K_{t}S=S for every smooth section sΓ()s\in\Gamma(\mathcal{E}), where (KtS)(x)=K(x,y,t)S(y)𝑑μ(y).(K_{t}S)(x)=\int K^{\nabla}(x,y,t)S(y)\,d\mu(y).

Lemma 2.

(Heat Kernel for Hilbert Bundles) Let Δ\Delta_{\nabla} be the connection Laplacian associated to a Hilbert bundle, (,,)(\mathcal{M},\mathcal{E},\nabla). Let n:=dim()/2n:=\dim(\mathcal{M})/2, ψ\psi be a cutoff function, and

Cn(x,y):=exp(d(x,y)24t)(4πt)n/2.C_{n}(x,y):=\frac{\exp\left(\frac{-d_{\mathcal{M}}(x,y)^{2}}{4t}\right)}{(4\pi t)^{n/2}}.

The following hold:

  1. 1.

    The Laplacian Δ\Delta_{\nabla} admits a unique heat kernel K(x,y,t)K^{\nabla}(x,y,t).

  2. 2.

    There exist smooth sections ΦiΓ(×,hom(pr2,pr1))\Phi_{i}\in\Gamma\left(\mathcal{M}\times\mathcal{M},\text{hom}\big(\text{pr}_{2}^{*}\mathcal{E},\text{pr}_{1}^{*}\mathcal{E}\big)\right) such that for every N>nN>n, the kernel

    KN(x,y,t):=Cn(x,y)ψ(d(x,y)2)i=0NtiΦi(x,y)j(x,y)1/2|dx|1/2K^{N}(x,y,t):=C_{n}(x,y)\psi(d_{\mathcal{M}}(x,y)^{2})\sum_{i=0}^{N}t^{i}\Phi_{i}(x,y)j(x,y)^{-1/2}|dx|^{1/2}

    is asymptotic to K(x,y,t)K^{\nabla}(x,y,t), in the sense that

    tk(K(x,y,t)KN(x,y,t))=O(tNn++2k2).\|\partial^{k}_{t}(K^{\nabla}(x,y,t)-K^{N}(x,y,t))\|_{\ell}=O(t^{N-\frac{n+\ell+2k}{2}}).
  3. 3.

    The leading term Φ0(x,y)\Phi_{0}(x,y) is equal to the parallel transport Pyx:yxP_{y\to x}:\mathcal{E}_{y}\to\mathcal{E}_{x} with respect to the Fréchet connection associated to \mathcal{H} along the unique length-minimizing geodesic joining yy and xx.

Proof.

(Sketch) We note that parametrix-based approach of Berline, Getzler, and Vergne [15] extends to our setting with minor but judicious modifications. As the original parametrix argument is quite lengthy, we simply make note of the necessary modifications to their argument. First, all integration must be understood as Bochner integration. Second, to avoid ambiguities surrounding the algebraic and geometric tensor bundle, the hom-bundle is used instead of tensor bundles.
Now note that the parametrix argument is fundamentally local in nature. Consider a smooth Hilbert bundle \mathcal{E}\to\mathcal{M}, and note that we require an associated Fréchet connection \nabla to be metric-compatible. Then, in a a local trivialization |UU×\mathcal{E}\big\rvert_{U}\cong U\times\mathcal{H}, with \mathcal{H} a separable Hilbert space, the connection has the form Δ=d+A\Delta=d+A with AA a smooth hom(,)\hom(\mathcal{H},\mathcal{H}) valued 1-form by Proposition 2. The connection Laplacian is a second-order elliptic operator with scalar principal symbol and lower-order coefficients in the Banach algebra hom(,)\hom(\mathcal{H},\mathcal{H}). The parametrix argument then proceeds via solving transport equations along geodesics and then correcting the resulting approximated kernel by a Volterra series. These steps do not rely in any essential way on finite-dimensionality of the fiber, but only on the fact that the coefficient algebra admits the usual smooth calculus and operator-norm estimates. Thus, replacing matrix-valued coefficients by hom(,)\hom(\mathcal{H},\mathcal{H})-valued ones, one obtains in the same way a smooth kernel K(x,y,t)K^{\nabla}(x,y,t), and the usual energy argument gives uniqueness. ∎

Remark 7.

The details of the necessary parametrix argument may be found in 2.1 – 2.5 of [15]. Note that while they assume a finite-rank hypothesis through the entirety of chapter two, the hypothesis is actually unused until section 2.6 of their work, when the operator KtK_{t} is required to be Hilbert-Schmidt.

G.4 Borel Functional Calculus

Given a linear map T:nnT:\mathbb{R}^{n}\to\mathbb{R}^{n} and a suitably well-behaved function g:g:\mathbb{R}\to\mathbb{C}, one may “apply gg” to TT to get a new linear map g(T):nng(T):\mathbb{R}^{n}\to\mathbb{R}^{n}. In particular, whenever gg is analytic with a globally defined Maclaurin expansion g(x)=jajxjg(x)=\sum_{j}a_{j}x^{j}, one may define g(T)=jajTjg(T)=\sum_{j}a_{j}T^{j}, where TjT^{j} is interpreted as the jj-fold composition TTT\circ\cdots\circ T. When TT is a bounded linear endo-operator on a Hilbert space, one may similarly define g(T)g(T) via series expansion. However, when TT is unbounded, more care must be taken to handle series convergence. This difficulty in the unbounded case is pertinent for the HilbNet architecture, where the convolution filter gg must be applied to the unbounded connection Laplacian Δ\Delta_{\nabla}. The Borel functional calculus provides an elegant solution. While traditionally formulated through the spectral theorem and projection-valued measures, for the purpose of the HilbNet architecture, the following version (Theorem VIII.5 of [82]) will be sufficient.

Theorem 3.

(Spectral Theorem - Functional Calculus Form) Let AA be a self-adjoint operator on a Hilbert space \mathcal{H}. Then there is a unique map ϕ^\hat{\phi} from the bounded Borel measurable functions on \mathbb{R} into the space of bounded linear operators on \mathcal{H}, ()\mathcal{L}(\mathcal{H}), so that

  • ϕ^\hat{\phi} is an algebraic *-homomorphism.

  • ϕ^\hat{\phi} is norm continuous, that is, ϕ^(h)()h\|\hat{\phi}(h)\|_{\mathcal{L}(\mathcal{\mathcal{H}})}\leq\|h\|_{\infty}.

  • Let gn(x)g_{n}(x) be a sequence of bounded Borel functions with gn(x)nxg_{n}(x)\xrightarrow[n\rightarrow\infty]{}x for each xx and |hn(x)||x|\left|h_{n}(x)\right|\leq|x| for all xx and nn. Then, for any ψdom(A)\psi\in\mathrm{dom}(A), limnϕ^(hn)ψ=Aψ\lim_{n\rightarrow\infty}\hat{\phi}\left(h_{n}\right)\psi=A\psi.

  • If gn(x)h(x)g_{n}(x)\to h(x) pointwise and if the sequence hn\|h_{n}\|_{\infty} is bounded, then ϕ^(hn)ϕ^(h)\hat{\phi}(h_{n})\to\hat{\phi}(h) strongly.

In addition:

  • If Aψ=λψA\psi=\lambda\psi, then ϕ^(h)ψ=h(λ)ψ\hat{\phi}(h)\psi=h(\lambda)\psi.

  • If g0g\geq 0, then ϕ^(h)0\hat{\phi}(h)\geq 0.

G.5 Cellular Sheaves and Sheaf Laplacians

Cellular sheaves on graphs are a data structure that generalizes weighted graphs. We take our exposition of cellular sheaves and their Laplacians primarily from [52]. See [52, 29, 53] for more details.

Definition 17 (Cellular Sheaf on a Graph).

Let G=(V,E)G=(V,E) be an undirected multi-graph without self-loops, and finitely many vertices and edges. Let vev\leq e denote that node vv is incident to the edge ee. A cellular sheaf, or equivalently network sheaf, \mathcal{F} on GG consists of the following data.

  • A vector space (σ)\mathcal{F}(\sigma) for each σVE\sigma\in V\amalg E, called the stalk over σ\sigma.

  • A linear map ve:(v)(e)\mathcal{F}_{v\leq e}:\mathcal{F}(v)\to\mathcal{F}(e) for each incident pair vev\leq e, called the restriction map of vv into ee.

Remark 8.

At the level of category theory, a cellular sheaf is a functor :G𝐕𝐞𝐜𝐭\mathcal{F}:G\to\mathbf{Vect}, where the graph G=(V,E)G=(V,E) is viewed as a posetal category with objects Ob(G)=VE\mathrm{Ob}(G)=V\amalg E, and a unique homomorphism from vev\to e whenever vev\leq e. In this light, we adopt the notation :G𝐕𝐞𝐜𝐭\mathcal{F}:G\to\mathbf{Vect} for a cellular sheaf \mathcal{F} on a graph GG.

Traditionally, to add geometric content to a cellular sheaf, one passes to weighted cellular sheaves: a cellular sheaf :G𝐕𝐞𝐜𝐭\mathcal{F}:G\to\mathbf{Vect} where each stalk σ\mathcal{F}_{\sigma} is a finite dimensional vector space endowed with an inner product ,σ\langle-,-\rangle_{\sigma}. To accommodate infinite-dimensional Hilbert space stalks, we instead follow the approach of [44].

Definition 18.

(Hilbert Cellular Sheaf on a Graph) A Hilbert cellular sheaf \mathcal{F} on a finite graph G=(V,E)G=(V,E) consists of the following data.

  • A Hilbert space (v)\mathcal{F}(v) for each vVv\in V, referred to as the node stalk over vv.

  • A Hilbert space (e)\mathcal{F}(e) for each eEe\in E, referred to as the edge stalk over vv.

  • For each edge eEe\in E with bounding vertices u,vu,v, a pair of bounded linear restriction maps ue:(u)(e)\mathcal{F}_{u\leq e}:\mathcal{F}(u)\to\mathcal{F}(e) and ve:(v)(e)\mathcal{F}_{v\leq e}:\mathcal{F}(v)\to\mathcal{F}(e).

Remark 9.

A bounded Hilbert sheaf \mathcal{F} can again be viewed as a functor :G𝐇𝐢𝐥𝐛\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}}, where GG is the graph GG viewed as an acyclic category, and 𝐇𝐢𝐥𝐛\mathbf{Hilb}_{\mathbb{R}} is the category of real Hilbert spaces and bounded globally-defined linear operators.

Remark 10.

In order to better differentiate between the usual finite-rank cellular sheaf on a graph and the potentially infinite-rank Hilbert cellular sheaves considered above, we use the terminology of network sheaves when we wish to emphasize the finite-rank consideration.

Definition 19.

Let :G𝐇𝐢𝐥𝐛\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}} be a bounded Hilbert sheaf on a graph G=(V,E)G=(V,E). The spaces of 0-cochains and 1-cochains are defined by:

C0(G;)\displaystyle C^{0}(G;\mathcal{F}) :=vV(v),\displaystyle:=\bigoplus_{v\in V}\mathcal{F}(v)\,,
C1(G;)\displaystyle C^{1}(G;\mathcal{F}) :=eE(e).\displaystyle:=\bigoplus_{e\in E}\mathcal{F}(e)\,.

where \oplus denotes the direct sum of Hilbert spaces. For a 0-cochain 𝐱C0(G;)\mathbf{x}\in C^{0}(G;\mathcal{F}), we denote the component of 𝐱\mathbf{x} in the stalk over the node vv by 𝐱v\mathbf{x}_{v}, with a similar notation for components of 1-cochains.

Definition 20.

Let G=(V,E)G=(V,E) be a graph. A signed incidence relation on GG is a pairing [:]:V×E{1,0,1}[-:-]:V\times E\to\{-1,0,1\} which satisfies the following conditions:

  1. 1.

    [v:e]0[v:e]\neq 0 if and only if vev\leq e.

  2. 2.

    For each ee, ve[v:e]=0\sum_{v\leq e}[v:e]=0.

Remark 11.

The data of a signed incidence structure on G=(V,E)G=(V,E) is equivalent to the choice of a source s(e)s(e) and target t(e)t(e) for each edge ee. In particular, the total set of incidences can be put into two-to-one correspondence with edges, counting the two distinct “boundings” of each eEe\in E.

Definition 21.

Let G=(V,E)G=(V,E) be a graph equipped with a signed incidence relation. Let :G𝐇𝐢𝐥𝐛\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}} be a bounded Hilbert sheaf on GG. The coboundary operator δ:C0(G;)C1(G;)\delta:C^{0}(G;\mathcal{F})\to C^{1}(G;\mathcal{F}) is the operator with image on the each edge stalk:

(δ𝐱)e:=ve[v:e]ve(𝐱v).\left(\delta\mathbf{x}\right)_{e}:=\sum_{\begin{subarray}{c}v\leq e\end{subarray}}[v:e]\mathcal{F}_{v\leq e}(\mathbf{x}_{v})\,.
Remark 12.

The coboundary map δ\delta depends on the choice of the signed incidence relation. However, given two signed incidence relations [:]1,[:]2[-:-]_{1},[-:-]_{2}, the corresponding coboundary operators δ1,δ2\delta_{1},\delta_{2} differ on the stalk (e)\mathcal{F}(e) by at most a sign difference (δ1𝐱)e=±(δ2𝐱)e(\delta_{1}\mathbf{x})_{e}=\pm(\delta_{2}\mathbf{x})_{e}. In particular, ker(δ)\ker(\delta) does not depend on the choice of ϵ\epsilon.

Definition 22.

Let :G𝐇𝐢𝐥𝐛\mathcal{F}:G\to\mathbf{Hilb}_{\mathbb{R}} be a bounded Hilbert network sheaf on a graph G=(V,E,ϵ)G=(V,E,\epsilon) equipped with a signed incidence relation. Let δ:C1(G;)C0(G;)\delta^{*}:C^{1}(G;\mathcal{F})\to C^{0}(G;\mathcal{F}) denote the linear adjoint of the corresponding coboundary operator with respect to the inner product structures on the spaces of cochains as product spaces. The Hilbert sheaf Laplacian is the operator Δ:C0(G;)C0(G;)\Delta_{\mathcal{F}}:C^{0}(G;\mathcal{F})\to C^{0}(G;\mathcal{F}) defined by the composition:

Δ=δδ.\Delta_{\mathcal{F}}=\delta^{*}\circ\delta\,.
Proposition 5.

The Hilbert sheaf Laplacian Δ\Delta_{\mathcal{F}} has the following properties.

  1. 1.

    The Laplacian Δ\Delta_{\mathcal{F}} is a self-adjoint globally-defined bounded linear operator.

  2. 2.

    When C0(G;)𝛿C1(G;)C^{0}(G;\mathcal{F})\xrightarrow{\delta}C^{1}(G;\mathcal{F}) is viewed as a Hilbert complex (in the sense of [23]) the kernel ker(Δ)\ker(\Delta_{\mathcal{F}}) recovers the space of harmonic 0-cochains.

  3. 3.

    The negative Laplacian Δ-\Delta_{\mathcal{F}} is the infinitesimal generator of a strongly continuous semigroup etΔe^{-t\Delta_{\mathcal{F}}} on C0(G;)C^{0}(G;\mathcal{F}). For a choice of initial cochain 𝐱0C0(G;)\mathbf{x}_{0}\in C^{0}(G;\mathcal{F}), the resulting flow 𝐱t:=etΔ𝐱0\mathbf{x}_{t}:=e^{-t\Delta_{\mathcal{F}}}\mathbf{x}_{0} is a solution to the sheaf heat equation ddt𝐱t=Δ𝐱t\frac{d}{dt}\mathbf{x}_{t}=-\Delta_{\mathcal{F}}\mathbf{x}_{t}. Moreover, the flow has limiting behavior limt𝐱t=Πker(Δ)𝐱0\lim_{t\to\infty}\mathbf{x}_{t}=\Pi_{\ker(\Delta_{\mathcal{F}})}\mathbf{x}_{0}, where Πker(Δ)\Pi_{\ker(\Delta_{\mathcal{F}})} denotes the orthogonal projection onto ker(Δ)\ker(\Delta_{\mathcal{F}}).

Proof.

See [44]. ∎

We now recall our construction of Hilbert cellular sheaf from a spatially-discretized Hilbert bundle.

Definition (Hilbert Cellular Sheaf from a Hilbert Bundle).

For a given Hilbert bundle (,,)(\mathcal{M},\mathcal{E},\nabla) with sampled points 𝒳n={x1,,xn}\mathcal{X}_{n}=\{x_{1},\dots,x_{n}\}\subset\mathcal{M}, fix a geodesic γij\gamma_{ij} between xix_{i} and xjx_{j}, for all i<ji<j. Further, let mγijm_{\gamma_{ij}} denote the midpoint of this geodesic. Consider the graph Gn=(𝒳n,E)G_{n}=(\mathcal{X}_{n},E) with an undirected edge eije_{ij} between xix_{i} and xjx_{j}, for each i<ji<j. The associated Hilbert cellular sheaf nt\mathcal{F}_{n}^{t} on GnG_{n} with bandwidth parameter tt is given by the following assignments:

  • The Hilbert space nt(xi):=xi\mathcal{F}_{n}^{t}(x_{i}):=\mathcal{E}_{x_{i}} for each xi𝒳nx_{i}\in\mathcal{X}_{n}, referred to as the node stalk over xi𝒳nx_{i}\in\mathcal{X}_{n}.

  • The Hilbert space nt(eij):=mγij\mathcal{F}_{n}^{t}(e_{ij}):=\mathcal{E}_{m_{\gamma_{ij}}} for each eijEe_{ij}\in E, referred to as the edge stalk over ei,jEe_{i,j}\in E.

  • For each edge eijEe_{ij}\in E with bounding vertices xi,xjx_{i},x_{j}, a pair of bounded linear restriction maps

    (nt)xieij:=kijtPximγij:nt(xi)nt(eij),\displaystyle(\mathcal{F}_{n}^{t})_{x_{i}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{i}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{i})\to\mathcal{F}_{n}^{t}(e_{ij}),
    (nt)xjeij:=kijtPxjmγij:nt(xj)nt(eij),\displaystyle(\mathcal{F}_{n}^{t})_{x_{j}\leq e_{ij}}:=\sqrt{k_{ij}^{t}}\,P_{x_{j}\to m_{\gamma_{ij}}}:\mathcal{F}_{n}^{t}(x_{j})\to\mathcal{F}_{n}^{t}(e_{ij}), (82)

    where kijt=ed(xi,xj)2/4tk_{ij}^{t}=e^{-d_{\mathcal{M}}(x_{i},x_{j})^{2}/4t}, with dd_{\mathcal{M}} the geodesic distance on \mathcal{M}, and PximγijP_{x_{i}\to m_{\gamma_{ij}}} denotes the unitary parallel transport map on \mathcal{E} between xix_{i} and mγijm_{\gamma_{ij}}.

Remark 13.

We make a few clarifying remarks on this construction.

  • Note that geodesics exist in this setting by the Hopf-Rinow theorem [36] by compactness of \mathcal{M}, and we should further choose length-minimizing geodesics.

  • For simplicity, we use the geodesic distance to weight our restriction maps. However, we could also use the Euclidean heat kernel ala [14], and this would result in a reweighted sheaf Laplacian but would ultimately converge to the same connection Laplacian. In practical implementations, it is thus well-justified to work with the Euclidean heat kernel rather than geodesic distance based weights.

  • While this particular construction is chosen to emphasize the relationship to [14] and allow for the necessary analytical arguments, there exist alternative constructions that are geodesic choice-independent that generate the same sheaf Laplacian but emphasize functoriality.

G.6 Empirical Laplacians

Analogously to [14], we introduce two intermediary notions of Laplacian that interpolate between the Hilbert sheaf Laplacian and the Laplacian on a Hilbert bundle. For this section, fix a Hilbert bundle with compatible Fréchet connection (,,)(\mathcal{M},\mathcal{E},\nabla) over a closed manifold \mathcal{M}.

Definition 23.

Consider the unique normalized volume pseudo-form on \mathcal{M} (or the usual volume form if \mathcal{M} is orientable), denoted dμd\mu. Thus, dμd\mu equips \mathcal{M} with a probability measure and we may refer to the resulting distribution as the uniform distribution on \mathcal{M}.

Henceforth, let 𝒳n={x1,,xn}\mathcal{X}_{n}=\{x_{1},\ldots,x_{n}\} denote the realization of an iid random sample drawn from the uniform distribution on \mathcal{M}. We then recall the following construction.

Definition.

(Point-Cloud Extension of Sheaf Laplacian) Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle and consider a sample 𝒳n\mathcal{X}_{n}\subset\mathcal{M}. Then the corresponding Hilbert sheaf Laplacian Δnt\Delta_{\mathcal{F}^{t}_{n}} may be extended to the point-cloud Laplacian Δ^nt\hat{\Delta}_{\mathcal{F}^{t}_{n}}, an operator on L2(,)L^{2}(\mathcal{M},\mathcal{E}) via

(Δ^ntS)(x):=1njed(x,xj)2/4t(S(x)PxjxS(xj))(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x):=\frac{1}{n}\sum_{j}e^{-d_{\mathcal{M}}(x,x_{j})^{2}/4t}\big(S(x)-P_{x_{j}\to x}S(x_{j})\big) (83)
Remark 14.

We make the following remarks about the point-cloud Laplacian.

  1. 1.

    The point-cloud Laplacian is the extension of the Hilbert sheaf Laplacian Δnt:C0(G;nt)C0(G;nt)\Delta_{\mathcal{F}^{t}_{n}}:C^{0}(G;\mathcal{F}_{n}^{t})\to C^{0}(G;\mathcal{F}^{t}_{n}) to an operator acting on sections of the Hilbert bundle \mathcal{E}\to\mathcal{M}, normalized by a factor of 1/n1/n. In particular, when evaluated at a sample point xi𝒳x_{i}\in\mathcal{X}, the point cloud Laplacian Δ^ntS(xi)\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x_{i}) is exactly the normalized xix_{i} component of the Hilbert sheaf Laplacian Δnt\Delta_{\mathcal{F}^{t}_{n}} evaluated at the cochain (S(x1),,S(xn))TC0(G;nt)(S(x_{1}),\ldots,S(x_{n}))^{T}\in C^{0}(G;\mathcal{F}^{t}_{n}).

  2. 2.

    The point-cloud Laplacian is well defined for any section S:S:\mathcal{M}\to\mathcal{E}, regardless of regularity.

Definition 24 (Functional Approximation Laplacian).

For a section SC1()S\in C^{1}(\mathcal{E}), we define the functional approximation to the connection Laplacian

(Δ^tS)(x):=(S(x)PyxS(y))exp(d(x,y)24t)𝑑y(\hat{\Delta}^{t}S)(x):=\int_{\mathcal{M}}\big(S(x)-P_{y\to x}S(y)\big)\exp\left(-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}\right)\,dy

where ()𝑑y\int(-)\,dy denotes Bochner integration with respect to the canonical normalized volume pseudo-form on \mathcal{M}.

Remark 15.

We make the following remarks about the functional approximation Laplacian.

  1. 1.

    The functional approximation Laplacian has no dependence on a sample of points from the underlying manifold. Instead, the functional approximation may be treated as the limiting operator where all points on the manifold have been sampled, and contribute uniformly via parallel transport.

  2. 2.

    The geometric data of the connection \nabla impacts the functional approximation Laplacian through the parallel transport maps Pyx:yxP_{y\to x}:\mathcal{E}_{y}\to\mathcal{E}_{x}, which links the fibers of \mathcal{E}.

  3. 3.

    Viewing the sample 𝒳n={x1,,xn}\mathcal{X}_{n}=\{x_{1},\ldots,x_{n}\}\subseteq\mathcal{M} as having been drawn iid from the uniform probability distribution on \mathcal{M}, the functional approximation Laplacian can be identified pointwise on a section SS as the expected value of the point cloud Laplacian. That is, for any SC1(,)S\in C^{1}(\mathcal{M},\mathcal{E}) and xx\in\mathcal{M}, we have:

    1vol()(Δ^tS)(x)=𝔼𝒳[(Δ^ntS)(x)]\frac{1}{\mathrm{vol}(\mathcal{M})}(\hat{\Delta}^{t}S)(x)=\mathbb{E}_{\mathcal{X}}\left[(\hat{\Delta}_{\mathcal{F}^{t}_{n}}S)(x)\right]

Appendix H Proofs of Results

H.1 Auxiliary Lemmas for Theorem 1

Lemma 3.

For x=(x1,,xm)mx=(x_{1},\ldots,x_{m})\in\mathbb{R}^{m}, kk\in\mathbb{N}, and a,t>0a,t>0, the following Gaussian identities hold:

1(2πat)m/2xiexp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int x_{i}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =0,\displaystyle=0, (84)
1(2πat)m/2xixjexp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int x_{i}x_{j}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =atδij,\displaystyle=at\delta_{ij}, (85)
1(2πat)m/2x2k+1exp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int\|x\|^{2k+1}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =O(tk+12)as t0,\displaystyle=O(t^{k+\frac{1}{2}})\qquad\text{as $t\to 0$}, (86)

where δij\delta_{ij} is the Kronecker delta.

Proof.

Notice that 1(2πat)m/2exp(x22at)\frac{1}{(2\pi at)^{m/2}}\exp\left(-\frac{\|x\|^{2}}{2at}\right) is the density function of a multivariate normal random variable X𝒩(0,atI)X\sim\mathcal{N}(0,atI). Equations (84) and (85) are simply the values of the coordinate-mean 𝔼[Xi]=0\mathbb{E}[X_{i}]=0 and covariance Cov(Xi,Xj)=atδij\mathrm{Cov}(X_{i},X_{j})=at\delta_{ij}. Finally, we may write X=atZX=\sqrt{at}Z, where Z𝒩(0,I)Z\sim\mathcal{N}(0,I) is a standard multivariate normal (zero-mean, uncorrelated, unit variance) in mm dimensions. We may write 𝔼[X2k+1]=(at)k+12𝔼[Z2k+1]\mathbb{E}[\|X\|^{2k+1}]=(at)^{k+\frac{1}{2}}\cdot\mathbb{E}[||Z||^{2k+1}], which confirms (86). ∎

Remark 16.

If instead of integrating over the entire domain m\mathbb{R}^{m}, we integrate over a symmetric ball B:=B(R,0)B:=B(R,0) centered at zero, we recover the following augmented Gaussian identities:

1(2πat)m/2Bxiexp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}x_{i}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =0,\displaystyle=0, (87)
1(2πat)m/2Bxixjexp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}x_{i}x_{j}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =(at+O(eR2/2at))δijas t0,\displaystyle=\bigg(at+O(e^{-R^{2}/2at})\bigg)\,\delta_{ij}\qquad\text{as $t\to 0$}, (88)
1(2πat)m/2Bx2k+1exp(x22at)𝑑x\displaystyle\frac{1}{(2\pi at)^{m/2}}\int_{B}\|x\|^{2k+1}\exp\left(-\frac{\|x\|^{2}}{2at}\right)\,dx =O(tk+12)as t0.\displaystyle=O(t^{k+\frac{1}{2}})\qquad\text{as $t\to 0$}. (89)

The restriction to the ball BB leaves all odd-degree symmetries unchanged, and augments the even symmetries by a factor of the form O(ec/t)O(e^{-c/t}), capturing the exponential decay on probability mass far away from the origin.

Lemma 4.

(Banach Mean Value Theorem) Consider Banach spaces 1,2\mathcal{B}_{1},\mathcal{B}_{2} and some open 𝒰1,\mathcal{U}\subseteq\mathcal{B}_{1}, Then if S:𝒰2S:\mathcal{U}\rightarrow\mathcal{B}_{2} is Gateaux differentiable, then the mean value theorem holds in the sense that

S(x)S(y)2xy1sup0t1Df(tx+(1t)y)\|S(x)-S(y)\|_{\mathcal{B}_{2}}\leq\|x-y\|_{\mathcal{B}_{1}}\sup_{0\leq t\leq 1}\|Df(tx+(1-t)y)\|

whenever the convex hull [x,y][x,y] lies in 𝒰\mathcal{U}.

Proof.

The proof is a standard functional analysis argument but we recall it here for completeness. Let 𝕃\mathbb{L} be the one-dimensional subspace spanned by some fixed nonzero u𝒰u\in\mathcal{U}. Consider φ(cu)=cu\varphi(cu)=c\|u\| as a continuous linear functional on 𝕃\mathbb{L} with norm 11. By the strong form of the Hahn-Banach, as stated in [3], for instance, we may extend this functional to the whole domain. Then note that φ(u)=u\varphi(u)=\|u\|, so the result follows by considering u=S(x)S(y)u=S(x)-S(y). ∎

Remark 17.

We recall that Fréchet differentiability in particular implies Gateaux differentiable, which will be sufficient for our purposes.

Lemma 5.

(Banach Weak Law of Large Numbers) Let {Xj}j\{X_{j}\}_{j\in\mathbb{N}} denote an independent identically distributed collection of random variables XjL1(Ω,B)X_{j}\in L^{1}(\Omega,B), where Ω:=(Ω,Σ,)\Omega:=(\Omega,\Sigma,\mathbb{P}) is a probability space and BB is a Banach space. Let Sn:=j=1nXjS_{n}:=\sum_{j=1}^{n}X_{j} denote the partial sum of the first nn random variables. As nn\to\infty, the normalized sequence 1nSn\frac{1}{n}S_{n} converges in probability to the mean μ:=𝔼[Xj]\mu:=\mathbb{E}[X_{j}]. That is for all ϵ>0\epsilon>0,

limn[|1nSnμ|>ϵ]=0.\lim_{n\to\infty}\mathbb{P}\left[\left|\frac{1}{n}S_{n}-\mu\right|>\epsilon\right]=0.
Proof.

See Pinelis [80] or Ledoux and Talagrand [64]. ∎

H.2 Key Lemmas for Theorem 1

Lemma 6.

(Taylor Series for Hilbert Signals) Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a Fréchet connection. For a given signal SCn+1(,)S\in C^{n+1}(\mathcal{M},\mathcal{E}), the space of (n+1)(n+1)-times continuously Fréchet-differentiable sections, and pp\in\mathcal{M}, consider any qq\in\mathcal{M} in a geodesic ball of pp and fix a length-minimizing curve γ(t)\gamma(t) from pp to qq. Let S(t):=Pγ(t)pS(γ(t))S^{*}(t):=P_{\gamma(t)\to p}S(\gamma(t)) denote the parallel transport of SS from γ(t)\gamma(t) back to pp along γ\gamma. As t0t\to 0, we have that

S(t)=[j=0ntjj!(γ˙(j)S)(p)]+O(tn+1)S^{*}(t)=\left[\sum_{j=0}^{n}\frac{t^{j}}{j!}\big(\nabla_{\dot{\gamma}}^{(j)}S\big)(p)\right]+O(t^{n+1})
Proof.

We first establish for a section V(t)V(t) of \mathcal{E} along γ\gamma, that:

ddt(Pγ(t)pV(t))=Pγ(t)p(γ˙V)(γ(t))\frac{d}{dt}\left(P_{\gamma(t)\to p}V(t)\right)=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}V\right)(\gamma(t))

We compute directly from the definition:

ddt(Pγ(t)pV(t))\displaystyle\frac{d}{dt}\left(P_{\gamma(t)\to p}V(t)\right) =limh0Pγ(t+h)pV(γ(t+h))Pγ(t)pV(γ(t))h\displaystyle=\lim_{h\to 0}\frac{P_{\gamma(t+h)\to p}V(\gamma(t+h))-P_{\gamma(t)\to p}V(\gamma(t))}{h} (90)
=limh0Pγ(t)p(Pγ(t+h)γ(t)V(γ(t+h))V(γ(t))h)\displaystyle=\lim_{h\to 0}P_{\gamma(t)\to p}\left(\frac{P_{\gamma(t+h)\to\gamma(t)}V(\gamma(t+h))-V(\gamma(t))}{h}\right) (91)
=Pγ(t)p(γ˙V)(γ(t))\displaystyle=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}V\right)(\gamma(t)) (92)

where in the second line we used the composition law for parallel transport, Pγ(t)p1Pγ(t+h)p=Pγ(t+h)γ(t)P^{-1}_{\gamma(t)\to p}\circ P_{\gamma(t+h)\to p}=P_{\gamma(t+h)\to\gamma(t)}. which follows from uniqueness of the parallel transport ODE, together with the fact that Pγ(t)pP_{\gamma(t)\to p} is a bounded linear operator and hence may be factored outside the limit.

Recall that S(t):=Pγ(t)pS(γ(t))S^{*}(t):=P_{\gamma(t)\to p}S(\gamma(t)) is a curve in the fixed Hilbert space p\mathcal{E}_{p}. Iteratively applying the previous derivative computation to V(t)=(γ˙(n)S)(γ(t))V(t)=\big(\nabla_{\dot{\gamma}}^{(n)}S\big)\big(\gamma(t)\big), where ()(n)(-)^{(n)} denotes nn-fold composition, yields:

dndtnS(t)=Pγ(t)p(γ˙(n)S)(γ(t))\frac{d^{n}}{dt^{n}}S^{*}(t)=P_{\gamma(t)\to p}\left(\nabla_{\dot{\gamma}}^{(n)}S\right)(\gamma(t))

Evaluating at t=0t=0, where Ppp=idP_{p\to p}=\mathrm{id}:

dndtnS|t=0=(γ˙(n)S)p.\frac{d^{n}}{dt^{n}}S^{*}\bigg|_{t=0}=\left(\nabla_{\dot{\gamma}}^{(n)}S\right)_{p}\,.

Finally, applying Taylor’s theorem for Banach spaces [25] yields the desired asymptotic statement. ∎

Lemma 7.

Let Δ^nt\hat{\Delta}_{\mathcal{F}^{t}_{n}} and Δ^t\hat{\Delta}^{t} denote the point-cloud and functional Laplacian operators with bandwidth tt. We have the concentration inequality:

[1t(4πt)m/2|Δ^ntS(x)Δ^tS(x)|>ϵ]2exp(t2(4πt)mϵ2n2K2)\mathbb{P}\left[\frac{1}{t(4\pi t)^{m/2}}\left|\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)-\hat{\Delta}^{t}S(x)\right|>\epsilon\right]\leq 2\exp\left(-\frac{t^{2}(4\pi t)^{m}\epsilon^{2}n}{2K^{2}}\right)

for some K>0K>0 which depends only on the choice of section ss. Consequently, we have the following limit in probability as nn\to\infty:

1t(4πt)m/2Δ^ntS(x)Δ^tS(x).\frac{1}{t(4\pi t)^{m/2}}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)\xrightarrow{\mathbb{P}}\hat{\Delta}^{t}S(x)\,.
Proof.

The point cloud Laplacian ΔntS(x)\Delta_{\mathcal{F}^{t}_{n}}S(x) may be viewed as the sample average of nn iid Hilbert-space valued random variables:

Xi:=exp(d(x,xi)24t)S(x)exp(d(x,xi)24t)PxixS(xi)X_{i}:=\exp\left(-\frac{d_{\mathcal{M}}(x,x_{i})^{2}}{4t}\right)S(x)-\exp\left(-\frac{d_{\mathcal{M}}(x,x_{i})^{2}}{4t}\right)P_{x_{i}\to x}S(x_{i})

Moreover, the functional approximation Δ^t\hat{\Delta}^{t} may be viewed as the expectation 𝔼Δnt\mathbb{E}\Delta_{\mathcal{F}^{t}_{n}} with respect to the uniform probability measure on \mathcal{M}. The bundle \mathcal{E}\to\mathcal{M} has separable fibers, so the results of [80] apply. We may recover a Hoeffding inequality:

[1δ|Δ^ntS(x)𝔼Δ^ntS(x)|>ϵ]2exp((δϵ)2n2K2)\mathbb{P}\left[\frac{1}{\delta}\left|\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)-\mathbb{E}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)\right|>\epsilon\right]\leq 2\exp\left(-\frac{(\delta\epsilon)^{2}n}{2K^{2}}\right)

where KK is the maximum norm of the section SS over the compact manifold \mathcal{M}. Setting δ=t(4πt)m/2\delta=t(4\pi t)^{m/2} and identifying 𝔼Δ^ntS(x)=Δ^tS(x)\mathbb{E}\hat{\Delta}_{\mathcal{F}^{t}_{n}}S(x)=\hat{\Delta}^{t}S(x) yields the desired concentration inequality. Convergence in probability follows immediately. ∎

Lemma 8.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle on a closed Riemannian manifold equipped with a Fréchet connection. Let BB\subseteq\mathcal{M} be open, and pBp\in B. Fix a section SC3(,)S\in C^{3}(\mathcal{M},\mathcal{E}). For each xx\in\mathcal{M}, let F(x):=PxpS(x)F(x):=P_{x\to p}S(x) denote the parallel transport of S(x)S(x) along the designated geodesic connecting xx to pp. For any real a>0a\in\mathbb{R}^{>0}, the following asymptotic bound holds as t0t\to 0:

|eyp4tF(y)𝑑μ(y)Beyp4tF(y)𝑑μ(y)|=o(ta).\left|\int_{\mathcal{M}}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)-\int_{B}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)\right|=o(t^{a}).
Proof.

We note this is a modified version of Lemma 4.1 of [14]. Let d:=infxBpxd:=\inf_{x\not\in B}\|p-x\|, K:=supxS(x)K:=\sup_{x\in\mathcal{M}}\|S(x)\|, and M:=μ(B)M:=\mu(\mathcal{M}\setminus B), where μ\mu is the canonical measure with respect to the volume pseudo-form. Since pp is compact and B\mathcal{M}\setminus B is closed, the infimum distance d>0d>0. Recalling that Pxp:xpP_{x\to p}:\mathcal{E}_{x}\to\mathcal{E}_{p} is unitary, and hence F(x)=S(x)\|F(x)\|=\|S(x)\| for all xx\in\mathcal{M}, we may bound:

|eyp4tF(y)𝑑μ(y)Beyp4tF(y)𝑑μ(y)|\displaystyle\left|\int_{\mathcal{M}}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)-\int_{B}e^{-\frac{\|y-p\|}{4t}}F(y)\,d\mu(y)\right| Beyp4tF(y)𝑑μ(y)\displaystyle\leq\int_{\mathcal{M}\setminus B}\left\|e^{-\frac{\|y-p\|}{4t}}F(y)\right\|\,d\mu(y)
Beyp4tS(y)𝑑μ(y)\displaystyle\leq\int_{\mathcal{M}\setminus B}e^{-\frac{\|y-p\|}{4t}}\left\|S(y)\right\|\,d\mu(y)
MKexp(d4t)\displaystyle\leq MK\exp\left(-\frac{d}{4t}\right)
=o(ta).\displaystyle=o(t^{a}).

Lemma 9.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian Δ\Delta_{\nabla}, and functional approximation Δ^t\hat{\Delta}^{t} with bandwidth tt . Fix a section SC3(,)S\in C^{3}(\mathcal{M},\mathcal{E}). For any xx\in\mathcal{M} as the bandwidth t0t\to 0, we have pointwise convergence:

limt01t(4πt)m2Δ^tS(x)=1vol()ΔS(x).\lim_{t\rightarrow 0}\frac{1}{t\left(4\pi t\right)^{\frac{m}{2}}}\hat{\Delta}^{t}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\,.
Proof.

Let γt:=1t1(4πt)m/2\gamma_{t}:=\frac{1}{t}\frac{1}{(4\pi t)^{m/2}}, and consider the scaled functional approximation γtΔ^t\gamma_{t}\hat{\Delta}^{t} which acts on a section SS at point pp by:

(γtΔ^tS)(p)=γted(p,x)24t(S(x)PxpS(p))𝑑μ(x).\left(\gamma_{t}\hat{\Delta}^{t}S\right)(p)=\gamma_{t}\int_{\mathcal{M}}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t}}(S(x)-P_{x\to p}S(p))\,d\mu(x)\,.

Let BB\subseteq\mathcal{M} denote a sufficiently small ball containing pp. By Lemma 8,

limt0[(γtΔ^tS)(p)]=limt0γtBed(p,x)24t(S(p)PpxS(x))𝑑μ(x).\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}S\right)(p)\right]=\lim_{t\to 0}\gamma_{t}\int_{B}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t}}(S(p)-P_{p\to x}S(x))\,d\mu(x).

Parameterize BB via geodesic coordinates such that p=0p=0. Let F(x):=PxpS(x)F(x):=P_{x\to p}S(x). Let S~:k\tilde{S}:\mathbb{R}^{k}\to\mathcal{H} and B~k\tilde{B}\subseteq\mathbb{R}^{k} denote the section SS and ball BB in coordinates. In these coordinates, we may write:

limt0[(γtΔ^ntS)(p)]=limt01vol()γtB~ed(expp(x),0)24t(F~(0)F~(x))det(gij)𝑑x.\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right]=\lim_{t\to 0}\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}e^{-\frac{d_{\mathcal{M}}(\exp_{p}(x),0)^{2}}{4t}}(\tilde{F}(0)-\tilde{F}(x))\sqrt{\det(g_{ij})}\,dx\,.

In geodesic coordinates, since the closed manifold \mathcal{M} has bounded Ricci curvature, the metric tensor has an asymptotic expansion given by (as in e.g. [36])

det(gij)=1+O(x2).\det(g_{ij})=1+O(\|x\|^{2})\,.

This approximation and the identification d(expp(x),p)=xd_{\mathcal{M}}(\exp_{p}(x),p)=\|x\| in coordinates allows us to express:

limt0[(γtΔ^ntS)(p)]=limt01vol()γtB~ex24t(F~(0)F~(x))(1+O(x2))𝑑x.\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right]=\lim_{t\to 0}\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}e^{-\frac{\|x\|^{2}}{4t}}(\tilde{F}(0)-\tilde{F}(x))\big(1+O(\|x\|^{2})\big)\,dx\,.

Let 𝕃t:=1vol()γtB~exp(x24t)(F~(0)F~(x))(1+O(x2))𝑑x\mathbb{L}_{t}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\|x\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\big(1+O(\|x\|^{2})\big)\,dx. This expression splits as 𝕃t=At+Bt\mathbb{L}_{t}=A_{t}+B_{t}, where:

At\displaystyle A_{t} :=1vol()γtB~exp(x24t)(F~(0)F~(x))𝑑x,\displaystyle:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\|x\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\,dx\,,
Bt\displaystyle B_{t} :=1vol()γtB~exp(x24t)(F~(0)F~(x))[O(x2)]𝑑x.\displaystyle:=\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\|x\|^{2}}{4t}\right)(\tilde{F}(0)-\tilde{F}(x))\left[O(\|x\|^{2})\right]\,dx\,.

We first analyze the limiting behavior of AtA_{t} as t0t\to 0. Within our geodesic coordinates centered at pp, we may further work with a local synchronous frame of \mathcal{E} along these coordinates x=(x1,,xk)x=(x_{1},\dots,x_{k}) such that it is parallel along all radial geodesics. Consequently, within this frame, ordinary derivatives of F~\tilde{F} coincide with covariant derivatives of SS at the basepoint:

iF~(p)\displaystyle\partial_{i}\tilde{F}(p) =eiS(p)\displaystyle=\nabla_{e_{i}}S(p)
ijF~(p)\displaystyle\partial_{i}\partial_{j}\tilde{F}(p) =eiejS(p)+𝖼𝗎𝗋𝗏ij,\displaystyle=\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\,,

where 𝖼𝗎𝗋𝗏ij:=12R(ei,ej)S(p)\mathsf{curv}_{ij}:=-\frac{1}{2}R^{\mathcal{E}}(e_{i},e_{j})S(p) is half the bundle curvature of \mathcal{E} arising from the connection \nabla. Hence by Lemma 6, the Taylor expansion of F~\tilde{F} at pp is

F~(x)=F~(p)+ixieiS(p)+12i,jxixj(eiejS(p)+𝖼𝗎𝗋𝗏ij)+O(x3),\tilde{F}(x)=\tilde{F}(p)+\sum_{i}x_{i}\nabla_{e_{i}}S(p)+\frac{1}{2}\sum_{i,j}x_{i}x_{j}\big(\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\big)+O(\|x\|^{3}),

with 𝖼𝗎𝗋𝗏ij=𝖼𝗎𝗋𝗏ji\mathsf{curv}_{ij}=-\mathsf{curv}_{ji}.

Using the augmented Gaussian identities (87), (88), and (89), we may compute:

It\displaystyle I_{t} :=1(4πt)m/2B~(F~(p)F~(x))exp(x24t)𝑑x\displaystyle:=\frac{1}{(4\pi t)^{m/2}}\int_{\tilde{B}}(\tilde{F}(p)-\tilde{F}(x))\exp\left(-\frac{\|x\|^{2}}{4t}\right)\,dx
=12i,j[(eiejS(p)+𝖼𝗎𝗋𝗏ij)(2t+O(ec/t))δij]+O(t3/2)\displaystyle=-\frac{1}{2}\sum_{i,j}\left[\big(\nabla_{e_{i}}\nabla_{e_{j}}S(p)+\mathsf{curv}_{ij}\big)\left(2t+O\left(e^{-c/t}\right)\right)\delta_{ij}\right]+O\left(t^{3/2}\right)
=tieieiS(p)ti𝖼𝗎𝗋𝗏ii+O(t3/2)\displaystyle=-t\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p)-t\sum_{i}\mathsf{curv}_{ii}+O\left(t^{3/2}\right)

where δij\delta_{ij} is the Kronecker delta. Since 𝖼𝗎𝗋𝗏ij\mathsf{curv}_{ij} is antisymmetric, 𝖼𝗎𝗋𝗏ii=0\mathsf{curv}_{ii}=0, hence

It=tieieiS(p)+O(t3/2).I_{t}=-t\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p)+O(t^{3/2}).

Inserting into the definition of AtA_{t} yields

limt0At=1vol()ieieiS(p).\lim_{t\to 0}A_{t}=-\frac{1}{\operatorname{vol}(\mathcal{M})}\sum_{i}\nabla_{e_{i}}\nabla_{e_{i}}S(p).

Thus we recover the connection Laplacian by Lemma 1 .

To analyze the quantity BtB_{t}, we first observe that inside of B~\tilde{B}, the parallel transport map Pexpp(x)pP_{\exp_{p}(x)\to p} varies smoothly in xx, with bounded derivatives. Since the section SS is C3C^{3}, the mean value theorem (Lemma 4) ensures there is a K>0K>0 such that F(x)F(p)Kxp\|F(x)-F(p)\|\leq K\|x-p\| for all xB~x\in\tilde{B}. Utilizing this Lipschitz bound and the augmented Gaussian identities, we may compute:

Bt\displaystyle\|B_{t}\| 1vol()γtB~exp(x24t)F~(0)F~(x)[O(x2)]𝑑x\displaystyle\leq\frac{1}{\mathrm{vol}(\mathcal{M})}\gamma_{t}\int_{\tilde{B}}\exp\left(-\frac{\|x\|^{2}}{4t}\right)\|\tilde{F}(0)-\tilde{F}(x)\|\left[O(\|x\|^{2})\right]\,dx
K/vol()t1(4πt)m/2B~x3exp(x24t)\displaystyle\leq\frac{K/\mathrm{vol}(\mathcal{M})}{t}\frac{1}{(4\pi t)^{m/2}}\int_{\tilde{B}}\,\|x\|^{3}\exp\left(-\frac{\|x\|^{2}}{4t}\right)
=1tO(t3/2)\displaystyle=\frac{1}{t}O\left(t^{3/2}\right)
=O(t).\displaystyle=O\left(\sqrt{t}\right)\,.

Hence Bt0B_{t}\to 0 as t0t\to 0. Combining with the analysis of AtA_{t} yields:

limt0[(γtΔ^ntS)(p)]\displaystyle\lim_{t\to 0}\left[\left(\gamma_{t}\hat{\Delta}^{t}_{\mathcal{F}_{n}}S\right)(p)\right] =limt0[At+Bt]\displaystyle=\lim_{t\to 0}[A_{t}+B_{t}]
=ΔS(p)\displaystyle=\Delta_{\nabla}S(p)

Lemma 10 (Variance asymptotics).

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian Δ\Delta_{\nabla}. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Consider a random sample of nn-points with respect to the normalized volume pseudo-form, 𝒳n={x1,x2,,xn}\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}. For each bandwidth tt, let γ:=(t(4πt)m/2)1\gamma:=(t(4\pi t)^{m/2})^{-1}, where m:=dim()m:=\dim(\mathcal{M}). Define an error term:

n(S):=γΔ^ntS1vol()ΔS.\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\,.

There is a constant Cvar>0C_{\mathrm{var}}>0, which depends on the section SS, such that the following asymptotic estimate holds as t0+t\to 0^{+}:

𝔼𝒳n[n(S)𝔼𝒳nn(S)L22]Cvarnt2+m2.\mathbb{E}_{\mathcal{X}_{n}}\left[\left\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]\leq\frac{C_{\mathrm{var}}}{nt^{2+\frac{m}{2}}}\,.
Proof.

Let YjY_{j} be the random section given by

Yj(x):=γ(S(x)PxjxS(xj))exp(d(x,xj)24t).Y_{j}(x):=\gamma(S(x)-P_{x_{j}\to x}S(x_{j}))\exp\left(-\frac{d_{\mathcal{M}}(x,x_{j})^{2}}{4t}\right).

Note that YjY_{j} is stochastic only through the sample point xjx_{j}, and that Yi,YjY_{i},Y_{j} are independent when iji\neq j. It is straightforward to verify by Funbini’s theorem that

𝔼𝒳n[n(S)𝔼𝒳nn(S)L22]\displaystyle\mathbb{E}_{\mathcal{X}_{n}}\left[\left\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|^{2}_{L^{2}}\right] =1n(𝔼x1[Y1L22]𝔼x1Y1L22)\displaystyle=\frac{1}{n}\left(\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right]-\|\mathbb{E}_{x_{1}}Y_{1}\|^{2}_{L^{2}}\right)
1n𝔼x1[Y1L22].\displaystyle\leq\frac{1}{n}\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right]\,.

Set K:=maxxS(x)xK:=\max_{x\in\mathcal{M}}\|S(x)\|_{\mathcal{E}_{x}}. By Fubini, we may exchange the order of integration and find

𝔼x1[Y1L22]=𝔼x1[Yj(x)x2]𝑑x\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right]=\int_{\mathcal{M}}\mathbb{E}_{x_{1}}\left[\left\|Y_{j}(x)\right\|_{\mathcal{E}_{x}}^{2}\right]dx

We may compute:

𝔼x1[Y1(x)x2]\displaystyle\mathbb{E}_{x_{1}}\left[\|Y_{1}(x)\|^{2}_{\mathcal{E}_{x}}\right] =1vol()γ(S(x)Px1xS(x1))exp(d(x,x1)24t)x2𝑑x1\displaystyle=\frac{1}{\mathrm{vol}(\mathcal{M})}\int_{\mathcal{M}}\left\|\gamma(S(x)-P_{x_{1}\to x}S(x_{1}))\exp\left(-\frac{d_{\mathcal{M}}(x,x_{1})^{2}}{4t}\right)\right\|_{\mathcal{E}_{x}}^{2}dx_{1}
4γ2K2vol()exp(d(x,x1)22t)𝑑x1.\displaystyle\leq\frac{4\gamma^{2}K^{2}}{\mathrm{vol}(\mathcal{M})}\int_{\mathcal{M}}\exp\left(-\frac{d_{\mathcal{M}}(x,x_{1})^{2}}{2t}\right)dx_{1}\,.

By standard Gaussian identities, the remaining Gaussian integral is O(tm/2)O(t^{m/2}) as t0+t\to 0^{+}, with constant independent of xx. Recalling the definition of γ\gamma in terms of tt, we recover that

𝔼x1[Y1(x)x2]=O(t(2+m))O(tm/2)=O(t(2+m/2)).\mathbb{E}_{x_{1}}\left[\|Y_{1}(x)\|^{2}_{\mathcal{E}_{x}}\right]=O(t^{-(2+m)})\cdot O(t^{m/2})=O(t^{-(2+m/2)}).

Since the constants are all independent of xx, we may integrate over \mathcal{M} and find that

𝔼x1[Y1L22]\displaystyle\mathbb{E}_{x_{1}}\left[\|Y_{1}\|^{2}_{L^{2}}\right] =O(t(2+m/2))\displaystyle=O(t^{-(2+m/2)})

as well. The result immediately follows. ∎

Lemma 11 (Bias asymptotics).

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian Δ\Delta_{\nabla}. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Consider a random sample of nn-points with respect to the normalized volume pseudo-form, 𝒳n={x1,x2,,xn}\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}. For each bandwidth tt, let γ:=(t(4πt)m/2)1\gamma:=(t(4\pi t)^{m/2})^{-1}, where m:=dim()m:=\dim(\mathcal{M}). Define an error term:

n(S):=γΔ^ntS1vol()ΔS\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}_{n}}^{t}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S

There is a constant Cbias>0C_{\mathrm{bias}}>0, which depends on the section SS, such that the following asymptotic estimate holds as t0+t\to 0^{+}:

[𝔼𝒳nn(S)L22](Cbiast)2.\left[\left\|\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]\leq\big(C_{\mathrm{bias}}t\big)^{2}\,.
Proof.

This follows from essentially repeating the analysis in the proof of Lemma 9 using the fourth order Taylor expansion F(y)=PyxS(y)F(y)=P_{y\to x}S(y) in terms of the covariant derivative. In particular, after accounting for the fourth-order Taylor remainder, we find that:

𝔼𝒳n[γΔ^nt(x)]=1vol()ΔS(x)+O(t).\mathbb{E}_{\mathcal{X}_{n}}[\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}(x)]=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)+O(t)\,.

The result follows. ∎

H.3 Proof of Theorem 1

H.3.1 Proof of Theorem 1A

Theorem.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian Δ\Delta_{\nabla}. Fix a section SC3(,)S\in C^{3}(\mathcal{M},\mathcal{E}). Consider a random sample of nn-points with respect to the normalized volume form, 𝒳n={x1,x2,,xn}\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}. Let 𝒳nt\mathcal{F}^{t}_{\mathcal{X}_{n}} be the associated Hilbert cellular sheaf with bandwidth tt and associated Point cloud Laplacian Δ^nt\hat{\Delta}_{\mathcal{F}^{t}_{n}}. Then, we have that in probability, for any xx\in\mathcal{M},

limn1tn(4πtn)m2Δ^ntnS(x)=1vol()ΔS(x)\lim_{n\rightarrow\infty}\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)=\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)

with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0.

Proof.

Let γn:=1tn1(4πtn)m/2\gamma_{n}:=\frac{1}{t_{n}}\frac{1}{(4\pi t_{n})^{m/2}}, and consider the scaled functional approximation γnΔ^tn\gamma_{n}\hat{\Delta}^{t_{n}} which acts on a section SS at point pp by:

(γnΔ^tnS)(p)=γned(p,x)24tn(PxpS(x)S(p))𝑑μ(x).\left(\gamma_{n}\hat{\Delta}^{t_{n}}S\right)(p)=\gamma_{n}\int_{\mathcal{M}}e^{-\frac{d_{\mathcal{M}}(p,x)^{2}}{4t_{n}}}(P_{x\to p}S(x)-S(p))\,d\mu(x)\,.

We may bound:

[1γnΔ^ntnS(x)1vol()ΔS(x)>2ϵ]\displaystyle\mathbb{P}\left[\left\|\frac{1}{\gamma_{n}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|>2\epsilon\right]\leq [1γnΔ^ntnS(x)Δ^tnS(x)>ϵ]\displaystyle\quad\>\mathbb{P}\left[\frac{1}{\gamma_{n}}\left\|\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\hat{\Delta}^{t_{n}}S(x)\right\|>\epsilon\right]
+[1γnΔ^tnS(x)1vol()ΔS(x)>ϵ]\displaystyle+\mathbb{P}\left[\left\|\frac{1}{\gamma_{n}}\hat{\Delta}^{t_{n}}S(x)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|>\epsilon\right]

as nn\to\infty we have tn0t_{n}\to 0. Hence the second quantity on the right hand side goes to zero by Lemma 9. On the other hand, the first quantity on the right hand side can be bound by the concentration inequality of Lemma 7, yielding:

[1γnΔ^ntnS(x)Δ^tnS(x)>ϵ]2exp(tn2+mn(4πt)mϵ22K2),\mathbb{P}\left[\frac{1}{\gamma_{n}}\left\|\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\hat{\Delta}^{t_{n}}S(x)\right\|>\epsilon\right]\leq 2\exp\left(\frac{t_{n}^{2+m}n(4\pi t)^{m}\epsilon^{2}}{2K^{2}}\right)\,,

where KK is a constant depending on the section SS. Since ntn2+mnt_{n}^{2+m}\to\infty as nn\to\infty, the concentration upper bound goes to zero as nn\to\infty as well. This completes the proof of the of the main theorem. ∎

H.3.2 Proof of theorem 1B

Theorem.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian Δ\Delta_{\nabla}. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Consider a random sample of nn-points with respect to the normalized volume form, 𝒳n={x1,x2,,xn}\mathcal{X}_{n}=\{x_{1},x_{2},\cdots,x_{n}\}\subset\mathcal{M}. Let 𝒳nt\mathcal{F}^{t}_{\mathcal{X}_{n}} be the associated Hilbert cellular sheaf with bandwidth tt and associated Point cloud Laplacian Δ^nt\hat{\Delta}_{\mathcal{F}^{t}_{n}}. Then, we have the following convergence in expectation:

limn𝔼𝒳[1tn(4πtn)m2Δ^ntnS(x)1vol()ΔS(x)L22]=0\lim_{n\rightarrow\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}\left(4\pi t_{n}\right)^{\frac{m}{2}}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n}}S(x)-\frac{1}{\operatorname{vol}(\mathcal{M})}\Delta_{\nabla}S(x)\right\|_{L^{2}}^{2}\right]=0

with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0.

Proof.

Let γn:=(tn(4πtn)m/2)1\gamma_{n}:=(t_{n}(4\pi t_{n})^{m/2})^{-1}. Define an error term:

n(S):=γΔ^ntS1vol()ΔS.\mathcal{R}_{n}(S):=\gamma\hat{\Delta}_{\mathcal{F}^{t}_{n}}S-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\,.

We may decompose the error into a bias and variance term as:

𝔼𝒳n[nL22]=𝔼𝒳n[n(S)𝔼𝒳nn(S)L22]+𝔼𝒳nn(S)L22.\mathbb{E}_{\mathcal{X}_{n}}\left[\|\mathcal{R}_{n}\|_{L^{2}}^{2}\right]=\mathbb{E}_{\mathcal{X}_{n}}\left[\left\|\mathcal{R}_{n}(S)-\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\right]+\left\|\mathbb{E}_{\mathcal{X}_{n}}\mathcal{R}_{n}(S)\right\|_{L^{2}}^{2}\,.

By the asymptotic results of Lemmas 10 and 11, there are positive constants CvarC_{\mathrm{var}} and CbiasC_{\mathrm{bias}} such that

𝔼𝒳n[nL22]Cvarntn2+m2+Cbias2tn2\mathbb{E}_{\mathcal{X}_{n}}\left[\|\mathcal{R}_{n}\|_{L^{2}}^{2}\right]\leq\frac{C_{\mathrm{var}}}{nt_{n}^{2+\frac{m}{2}}}+C_{\mathrm{bias}}^{2}t_{n}^{2}

Since tn=n1/(m+2+α)t_{n}=n^{-1/(m+2+\alpha)}, we have tn20t_{n}^{2}\to 0 and ntn2+m2nt_{n}^{2+\frac{m}{2}}\to\infty as nn\to\infty. Hence both terms vanish in limit, proving the desired convergence. ∎

H.4 Key Lemmas for Theorem 2

Definition 25.

Let \mathcal{E} be a smooth Hilbert bundle over a manifold \mathcal{M}. A finite rank approximating sequence for \mathcal{E} is a sequence of smooth sub-bundles {d}d1\{\mathcal{E}_{d}\}_{d\geq 1} with the following properties:

  1. 1.

    For each dd, d\mathcal{E}_{d} has finite rank;

  2. 2.

    For each dd, the bundle d\mathcal{E}_{d} is a sub-bundle of d+1\mathcal{E}_{d+1};

  3. 3.

    For each xx\in\mathcal{M}, we have that cl(span(d(d)x))=x\mathrm{cl}\left(\mathrm{span}\left(\bigcup_{d}(\mathcal{E}_{d})_{x}\right)\right)=\mathcal{E}_{x}.

Lemma 12.

Let \mathcal{E} be a smooth Hilbert bundle with infinite-dimensional fibers over a compact manifold \mathcal{M}. A finite rank approximating sequence {d}d\{\mathcal{E}_{d}\}_{d} exists.

Proof.

By Kuiper’s Theorem [61], the unitary group of the typical fiber \mathcal{H} is contractible, implying that there exists an isomorphism of \mathcal{E} with bundle ×\mathcal{M}\times\mathcal{H} at the level of purely topological bundles. Now note that every Hilbert bundle ×\mathcal{M}\times\mathcal{H} admits a finite rank approximating sequence, by considering a Hilbert space basis {e1,e2,}\{e_{1},e_{2},\ldots\} for \mathcal{H}, and defining n:=span(e1,,en)\mathcal{H}_{n}:=\mathrm{span}(e_{1},\ldots,e_{n}). The sequence {×d}d\{\mathcal{M}\times\mathcal{H}_{d}\}_{d} can then be seen to be a finite rank approximating sequence. Furthermore, because the base space is a finite-dimensional manifold, this topological trivialization can be upgraded to a smooth global trivialization [75]. Let Φ:×\Phi:\mathcal{E}\to\mathcal{M}\times\mathcal{H} be such a smooth isomorphism. Thus, the finite rank approximating sequence {×d}d\{\mathcal{M}\times\mathcal{H}_{d}\}_{d} pulls back to a finite rank approximating sequence on \mathcal{E} by d:=Φ1(×d)\mathcal{E}_{d}:=\Phi^{-1}(\mathcal{M}\times\mathcal{H}_{d}), as desired. ∎

Lemma 13.

Let (,,)(\mathcal{E},\mathcal{M},\nabla) be a smooth infinite-dimensional Hilbert bundle over a compact manifold \mathcal{M} equipped with a compatible connection \nabla. Let {d}d1\{\mathcal{E}_{d}\}_{d\geq 1} be a finite rank approximating sequence for \mathcal{E}. The data of \nabla induces a compatible connection d:=Πd\nabla_{d}:=\Pi_{d}\nabla on d\mathcal{E}_{d}, where Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} denotes the fiber-wise orthogonal projection onto d\mathcal{E}_{d}.

Proof.

This follows immediately from the fact that \nabla is compatible and orthogonal projections are self-adjoint. ∎

Remark 18.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be an infinite-dimensional Hilbert bundle equipped with a compatible connection. By the previous lemmas, we may always find a finite rank approximating sequence (d,,d)(\mathcal{E}_{d},\mathcal{M},\nabla_{d}), each with compatible connection. These compatible connections induce connection Laplacians Δd\Delta_{\nabla_{d}} on each sub-bundle. Moreover, Theorem 1B applies to each Laplacian Δd\Delta_{\nabla_{d}}.

We restate and prove Proposition 1.

Proposition.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a Hilbert bundle, with strictly infinite-dimensional generic Hilbert-space fiber \mathcal{H}. Fix an orthogonal basis ={e1,e2,}\mathcal{B}=\{e_{1},e_{2},...\} of \mathcal{H} and let d:=span(e1,e2,,ed)\mathcal{H}_{d}:=\mathrm{span}(e_{1},e_{2},...,e_{d}). Then there exists a smooth map of bundles

Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} (93)

where d\mathcal{E}_{d} is a dd-dimensional vector bundle with generic fiber d\mathcal{H}_{d} and at each xx\in\mathcal{M}, Πd|x:xd,x\left.\Pi_{d}\right|_{\mathcal{E}_{x}}:\mathcal{E}_{x}\to\mathcal{E}_{d,x} recovers the usual orthogonal projection map.

Proof.

One we fix a basis \mathcal{B}, the conclusion follows from the proof of Lemma 12 and an application of Lemma 13. ∎

Definition 26.

Let (,,)(\mathcal{M},\mathcal{E},\nabla) be a smooth Hilbert bundle over a closed manifold \mathcal{M} of dimension mm equipped with a compatible connection \nabla. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Let {d}d\{\mathcal{E}_{d}\}_{d} be a finite rank approximating sequence for \mathcal{E} with induced connections d\nabla_{d}, connection Laplacians Δd\Delta_{\nabla_{d}}, and bandwidth tt point cloud Laplacians Δ^n,dt\hat{\Delta}_{n,d}^{t} associated to an iid sampling 𝒳={x1,x2,}\mathcal{X}=\{x_{1},x_{2},\ldots\}. Let Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} denote the fiber-wise orthogonal projection map onto d\mathcal{E}_{d}. The discretization error 𝙳(n,d)\mathtt{D}(n,d) and the continuous geometry error 𝙴(d)\mathtt{E}(d) are the quantities:

𝖣(n,d)\displaystyle\mathsf{D}(n,d) :=γnΔ^n,dtn(ΠdS)1vol()Δd(ΠdS)L22\displaystyle:=\left\|\gamma_{n}\hat{\Delta}^{t_{n}}_{n,d}(\Pi_{d}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla_{d}}(\Pi_{d}S)\right\|_{L^{2}}^{2}
𝖤(d)\displaystyle\mathsf{E}(d) :=1vol()(Δd(ΠdS)ΔS)L22.\displaystyle:=\left\|\frac{1}{\mathrm{vol}(\mathcal{M})}\left(\Delta_{\nabla_{d}}(\Pi_{d}S)-\Delta_{\nabla}S\right)\right\|_{L^{2}}^{2}\,.
Remark 19.

The discretization error 𝖣(n,d)\mathsf{D}(n,d) captures the error introduced by approximating the connection Laplacian Δd\Delta_{\nabla_{d}} by a point-cloud Laplacian on nn points. The continuous geometry error 𝖤(d)\mathsf{E}(d) is a deterministic quantity that captures the error introduced by moving to the sub-bundle d\mathcal{E}_{d}.

Lemma 14.

The continuous geometry error 𝖤(d)\mathsf{E}(d) converges to zero as dd\to\infty.

Proof.

Without loss of generality, by pushing through the global trivialization of Kuiper’s theorem, we may assume without loss of generality that =×\mathcal{E}=\mathcal{H}\times\mathcal{M} as topological bundles. First, note that the orthogonal projection Πd\Pi_{d} commutes with the Fréchet derivative DD on sections, in the sense that:

D(ΠdS)=Πd(DS).D(\Pi_{d}S)=\Pi_{d}(DS).

By Proposition 2, for any vector field XX, we may write:

XS=DS(X)+A(X)S\nabla_{X}S=DS(X)+A(X)S

where A(X)A(X) is a globally bounded linear operator acting on the fiber \mathcal{H}. We may now compute:

(d)X(ΠdS)\displaystyle(\nabla_{d})_{X}(\Pi_{d}S) =ΠdX(ΠdS)\displaystyle=\Pi_{d}\nabla_{X}(\Pi_{d}S)
=Πd(DΠdS(X)+A(X)ΠdS)\displaystyle=\Pi_{d}\left(D\Pi_{d}S(X)+A(X)\Pi_{d}S\right)
=ΠdDS(X)+ΠdA(X)ΠdS)\displaystyle=\Pi_{d}DS(X)+\Pi_{d}A(X)\Pi_{d}S)

Since Πdid\Pi_{d}\to\text{id} in the strong operator topology and DS(X)DS(X)\in\mathcal{H}, we have that ΠdDS(X)DS(X)\Pi_{d}DS(X)\to DS(X) as dd\to\infty. Similarly since A(X):A(X):\mathcal{H}\to\mathcal{H} is bounded, we find that for each xx\in\mathcal{M}, we have ΠdA(X)ΠdS(x)A(X)S(x)\Pi_{d}A(X)\Pi_{d}S(x)\to A(X)S(x). Therefore d(ΠdS)(x)S(x)\nabla_{d}(\Pi_{d}S)(x)\to\nabla S(x) . By a similar argument, we may conclude that for a pair of vector fields X,YX,Y, that (d)X(d)Y(ΠdS)(x)XYS(x).(\nabla_{d})_{X}(\nabla_{d})_{Y}(\Pi_{d}S)(x)\to\nabla_{X}\nabla_{Y}S(x). Hence using the coordinate form of the connection Laplacian of Lemma 1, we recover that:

limd[ΔdS(x)]=ΔS(x)\lim_{d\to\infty}\left[\Delta_{\nabla_{d}}S(x)\right]=\Delta_{\nabla}S(x)

for all xx\in\mathcal{M}.

We now upgrade to this statement to L2L^{2} convergence by the dominated convergence theorem. If there is a global bound KK such that Δd(ΠdS)(x)ΔS(x)K\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)-\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K for all xx\in\mathcal{M}, we may apply the dominated convergence theorem and conclude that E(d)0E(d)\to 0. To find such a KK, we first observe that ΔS\Delta_{\nabla}S is a continuous section, and hence ΔS(x)K1\|\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K_{1} is bounded on the compact manifold \mathcal{M}. Next, for a pair of vector fields X,YX,Y, we may use the fact that Πdop=1\|\Pi_{d}\|_{\text{op}}=1 and the representation =D+A\nabla=D+A to find a dd-independent bound on ((d)X(d)Y(ΠdS))(x)\left\|\big((\nabla_{d})_{X}(\nabla_{d})_{Y}(\Pi_{d}S)\big)(x)\right\|_{\mathcal{H}}. The local coordinate form the connection Laplacian Δd\Delta_{\nabla_{d}} again allows us to conclude that there is a bound Δd(ΠdS)(x)K2\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)\|_{\mathcal{H}}\leq K_{2} for all xx\in\mathcal{M} and d1d\geq 1. The triangle inequality finally allows us to bound:

Δd(ΠdS)(x)ΔS(x)K1+K2.\|\Delta_{\nabla_{d}}(\Pi_{d}S)(x)-\Delta_{\nabla}S(x)\|_{\mathcal{H}}\leq K_{1}+K_{2}\,.

This completes the proof. ∎

H.5 Proof of Theorem 2

Theorem.

Let (,,)(\mathcal{E},\mathcal{M},\nabla) be an infinite-dimensional Hilbert bundle over a closed manifold \mathcal{M} of dimension mm equipped with a compatible connection \nabla. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Let {d}d\{\mathcal{E}_{d}\}_{d} be a finite rank approximating sequence for \mathcal{E} with induced connections d\nabla_{d}, connection Laplacians Δd\Delta_{\nabla_{d}}, and bandwidth tt point cloud Laplacians Δ^n,dt\hat{\Delta}_{\mathcal{F}^{t}_{n,d}} associated to an iid sampling 𝒳={x1,x2,}\mathcal{X}=\{x_{1},x_{2},\ldots\}. Let Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} denote the fiber-wise orthogonal projection map onto d\mathcal{E}_{d}. There exists a deterministic increasing sequence dnd_{n}, depending on the section SS, such that

limn𝔼𝒳[1tn(4πtn)m/2Δ^n,dntn(ΠdnS)1vol()ΔSL22]=0\lim_{n\to\infty}\mathbb{E}_{\mathcal{X}}\left[\left\|\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}_{n,d_{n}}^{t_{n}}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]=0

with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0.

Proof.

Let γn:=1tn(4πtn)m/2\gamma_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}. We may easily bound the expected global error in terms of the continuous geometry error and the expected discretization error:

𝔼𝒳[γnΔ^n,dntn(ΠdnS)1vol()ΔSL22]2𝔼𝒳n[𝖣(n,d)]+2𝖤(d).\mathbb{E}_{\mathcal{X}}\left[\left\|\gamma_{n}\hat{\Delta}_{\mathcal{F}_{n,d_{n}}}^{t_{n}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]\leq 2\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,d)]+2\mathsf{E}(d)\,.

By applying Theorem 1 to the bundle d\mathcal{E}_{d}, the expected discretization error 𝔼𝒳n[𝖣(n,d)]0\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,d)]\to 0 as nn\to\infty. On the other hand, Lemma 14 ensures that 𝖤(d)0\mathsf{E}(d)\to 0 as dd\to\infty.

To construct the diagonal sequence, first choose an increasing sequence pip_{i} such that 𝖤(d)<1i\mathsf{E}(d)<\frac{1}{i} for all dpid\geq p_{i}. Next, choose an increasing sequence NiN_{i} such that 𝔼𝒳n[𝖣(n,pi)]<1i\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,p_{i})]<\frac{1}{i} for each nNin\geq N_{i}. For each n1n\geq 1, set ϕ(n):=max{iNin}\phi(n):=\max\{i\>\mid\>N_{i}\leq n\}. Observe that 𝖤(pϕ(n))<1ϕ(n)1n\mathsf{E}(p_{\phi(n)})<\frac{1}{\phi(n)}\leq\frac{1}{n} since ϕ(n)n\phi(n)\geq n. On the other hand, 𝔼𝒳n[𝖣(n,pϕ(n))]<1ϕ(n)1n\mathbb{E}_{\mathcal{X}_{n}}[\mathsf{D}(n,p_{\phi(n)})]<\frac{1}{\phi(n)}\leq\frac{1}{n} as well. Therefore setting dn:=pϕ(n)d_{n}:=p_{\phi(n)} yields a diagonal sequence with the property that

𝔼𝒳[γnΔ^n,dntn(ΠdnS)1vol()ΔSL22]<4n.\mathbb{E}_{\mathcal{X}}\left[\left\|\gamma_{n}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}(\Pi_{d_{n}}S)-\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}S\right\|_{L^{2}}^{2}\right]<\frac{4}{n}\,.

This diagonal subsequence is deterministic as both the continuous geometry error and the expected discretization error are deterministic. ∎

H.6 Key Lemmas for Corollary 2.1

Lemma 15.

Let (,,)(\mathcal{E},\mathcal{M},\nabla) be a smooth infinite-dimensional Hilbert bundle over a closed manifold \mathcal{M} of dimension mm equipped with a compatible connection \nabla. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Let {d}d\{\mathcal{E}_{d}\}_{d} be a finite rank approximating sequence for \mathcal{E} with induced connections d\nabla_{d}, connection Laplacians Δd\Delta_{\nabla_{d}}, and bandwidth tt point cloud Laplacians Δ^n,dt\hat{\Delta}_{\mathcal{F}^{t}_{n,d}} associated to an iid sampling 𝒳={x1,x2,}\mathcal{X}=\{x_{1},x_{2},\ldots\}. Let Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} denote the fiber-wise orthogonal projection map onto d\mathcal{E}_{d}. Let {dn}\{d_{n}\} be the diagonal sequence induced by Theorem 2. Let Δ~n:=1tn(4πtn)m/2Δ^n,dntnΠdn\tilde{\Delta}_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}\Pi_{d_{n}} with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0. Similarly, let Δ~:=1vol()Δ\tilde{\Delta}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla}. Finally, let g:g:\mathbb{R}\to\mathbb{R} be a bounded continuous function.

Under the Borel functional calculus (Appendix G.4), we have MSE convergence:

𝔼[g(Δ~n)Sg(Δ~)SL22]0.\mathbb{E}\left[\left\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\right\|_{L^{2}}^{2}\right]\to 0\,.
Proof.

Observe that Δ~\tilde{\Delta} and each Δ~n\tilde{\Delta}_{n} are self-adjoint unbounded operators on L2(;)L^{2}(\mathcal{E};\mathcal{M}). We begin by showing there is a common core 𝒟L2(;)\mathcal{D}\subseteq L^{2}(\mathcal{E};\mathcal{M}) and a subsequence nin_{i} for which Δ~niSΔ~S\tilde{\Delta}_{n_{i}}S\to\tilde{\Delta}S in L2(;)L^{2}(\mathcal{E};\mathcal{M}).

Take 𝒟\mathcal{D} to be any countable dense subset of C(,)C^{\infty}(\mathcal{M},\mathcal{E}). Since \mathcal{E} has separable fibers, such a countable dense subset necessarily exists. Moreover, Δ~\tilde{\Delta} and Δ~n\tilde{\Delta}_{n} are all defined on 𝒟\mathcal{D}. This 𝒟\mathcal{D} shall be our common core.

Treat Δ~nS\tilde{\Delta}_{n}S as an L2(,)L^{2}(\mathcal{M},\mathcal{E})-valued random variable. For each S𝒟S\in\mathcal{D}, Theorem 2 ensures that Δ~nSΔ~S\tilde{\Delta}_{n}S\to\tilde{\Delta}S in mean square error. It immediately follows that Δ~nS𝒳Δ~S\tilde{\Delta}_{n}S\xrightarrow{\mathbb{P}_{\mathcal{X}}}\tilde{\Delta}S in probability with respect to the measure from which the sampling 𝒳\mathcal{X} is drawn.

Enumerate 𝒟={S1,S2,}\mathcal{D}=\{S_{1},S_{2},\ldots\}. Since convergence in probability implies almost sure convergence on a subsequence, we may inductively construct a doubly-indexed sequence of indices NbaN^{a}_{b} such that the following properties hold:

  1. 1.

    For each aa, the sequence {Nba+1}b\{N_{b}^{a+1}\}_{b} is a subsequence of {Nba}b\{N_{b}^{a}\}_{b};

  2. 2.

    Along the sequence {Nba}b\{N^{a}_{b}\}_{b}, we have almost sure convergence Δ~NbaSaa.s.Δ~Sa\tilde{\Delta}_{N^{a}_{b}}S_{a}\xrightarrow{\mathrm{a.s.}}\tilde{\Delta}S_{a} as bb\to\infty.

Take the diagonal sequence ni:=Niin_{i}:=N^{i}_{i}. Along nin_{i}, we have that Δ~niSa.s.Δ~S\tilde{\Delta}_{n_{i}}S\xrightarrow{\mathrm{a.s.}}\tilde{\Delta}S as ii\to\infty for all S𝒟S\in\mathcal{D}. Now applying Theorems VIII.25(a) and VIII.20(b) of [82], we may conclude that:

g(Δ~ni)Sg(Δ~)SL2a.s.0\|g(\tilde{\Delta}_{n_{i}})S-g(\tilde{\Delta})S\|_{L^{2}}\xrightarrow{\mathrm{a.s.}}0

almost surely for each SS.

Notice that the previous argument can not only be applied to the sequence n={1,2,3,}n=\{1,2,3,...\}, but also along any subsequence of {n}n\{n\}_{n}. Since any subsequence of {n}n\{n\}_{n} therefore has an almost surely convergent sub-subsequence, we may conclude that for the original sequence nn, for each section SS (not necessarily in 𝒟\mathcal{D}), we have convergence in probability:

g(Δ~n)Sg(Δ~)SL2𝒳0\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\|_{L^{2}}\xrightarrow{\mathbb{P}_{\mathcal{X}}}0

in probability with respect to the sampling 𝒳\mathcal{X} as nn\to\infty.

Finally, by the spectral calculus, since g:g:\mathbb{R}\to\mathbb{R} is bounded by some BB, we may bound g(Δ~n)SL2BSL2\|g(\tilde{\Delta}_{n})S\|_{L^{2}}\leq B\|S\|_{L^{2}}. Similarly, |g(Δ~)SL2BSL2|g(\tilde{\Delta})S\|_{L^{2}}\leq B\|S\|_{L^{2}}. It follows that for each nn,

g(Δ~n)Sg(Δ~)SL224B2\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\|_{L^{2}}^{2}\leq 4B^{2}

for all nn. Hence the dominated convergence theorem admits an MSE upgrade to the desired conclusion:

𝔼[g(Δ~n)Sg(Δ~)SL22]0.\mathbb{E}\left[\left\|g(\tilde{\Delta}_{n})S-g(\tilde{\Delta})S\right\|_{L^{2}}^{2}\right]\to 0\,.

H.7 Proof of Corollary 2.1

Corollary.

Under the hypotheses of Theorem 2, let {dn}n\{d_{n}\}_{n} be the constructed deterministic diagonal sequence, and LL\in\mathbb{N}. Let σ\sigma be a fiber-wise nonlinearity that is CσC_{\sigma}-Lipschitz in the corresponding fiber norms. For bounded continuous filters g0,,gL1𝒲g^{0},\ldots,g^{L-1}\in\mathcal{W}, consider the continuous and sampled architectures:

S+1\displaystyle S^{\ell+1} :=σ(g(Δ~)S)\displaystyle:=\sigma\left(g^{\ell}\left(\tilde{\Delta}\right)S^{\ell}\right)
Sn+1\displaystyle S_{n}^{\ell+1} :=σ(g(Δ~n)Sn),\displaystyle:=\sigma\left(g^{\ell}\left(\tilde{\Delta}_{n}\right)S_{n}^{\ell}\right)\,,

with initializations S0:=SS^{0}:=S and Sn0:=ΠdnSS_{n}^{0}:=\Pi_{d_{n}}S, where Δ~:=1vol()Δ\tilde{\Delta}:=\frac{1}{\mathrm{vol}(\mathcal{M})}\Delta_{\nabla} and Δ~n:=1tn(4πtn)m/2Δ^n,dntn\tilde{\Delta}_{n}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{n,d_{n}}}.

Then, the output of the discrete architecture converges in mean square to the output of the continuous architecture:

limn𝔼[SnLSLL2(;)2]=0.\lim_{n\to\infty}\mathbb{E}\left[\left\|S_{n}^{L}-S^{L}\right\|_{L^{2}(\mathcal{M};\mathcal{E})}^{2}\right]=0\,.
Proof.

We proceed by induction on the layers. Let en,e_{n,\ell} and δn,\delta_{n,\ell} denote the signal error and spectral filter error at layer \ell respectively:

en,\displaystyle e_{n,\ell} :=SnSL2\displaystyle:=\left\|S_{n}^{\ell}-S^{\ell}\right\|_{L^{2}}
δn,\displaystyle\delta_{n,\ell} :=g(Δ~n)(ΠdnS)g(Δ~)SL2.\displaystyle:=\left\|g_{\ell}(\tilde{\Delta}_{n})(\Pi_{d_{n}}S^{\ell})-g_{\ell}(\tilde{\Delta})S^{\ell}\right\|_{L^{2}}\,.

Let M:=supxg(x)M_{\ell}:=\sup_{x\in\mathbb{R}}\|g_{\ell}(x)\|. By the Lipschitz continuity of the nonlinearity σ\sigma, the triangle inequality yields the pathwise recursive bound:

en,+1Cσ(Men,+δn,).e_{n,\ell+1}\leq C_{\sigma}(M_{\ell}e_{n,\ell}+\delta_{n,\ell})\,.

Iterating this inequality over LL layers expands to:

en,L(r=0L1CσMr)en,0+q=0L1(r=q+1L1CσMr)Cσδn,q.e_{n,L}\leq\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)e_{n,0}+\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}\delta_{n,q}\,.

By Theorem 2, the initialized signal error en,00e_{n,0}\to 0 in mean square. Furthermore, by Lemma 15, δn,q0\delta_{n,q}\to 0 in mean square as well for each qq.

Taking the expectation of the squared recursive bound and applying the Cauchy-Schwarz inequality to the finite sum isolates the individual mean square limits. Hence, as n0n\to 0, we have that the total error satisfies:

limn𝔼[en,L2]=0.\lim_{n\to\infty}\mathbb{E}[e_{n,L}^{2}]=0\,.

In particular, we have that as sampling density goes to infinity, in MSE ,

Ω(n,dn,Δ^n,dn,𝒲,σ)Ω(,Δ,𝒲,σ).\Omega(\mathcal{F}_{n,d_{n}},\hat{\Delta}_{\mathcal{F}_{n,d_{n}}},\mathcal{W},\sigma)\to\Omega(\mathcal{E},\Delta_{\nabla},\mathcal{W},\sigma)\,.

H.8 Proof of Corollary 2.2

Corollary.

Let (,,)(\mathcal{E},\mathcal{M},\nabla) be a smooth Hilbert bundle over a closed manifold \mathcal{M} of dimension mm equipped with a compatible connection \nabla. Fix a section SC4(,)S\in C^{4}(\mathcal{M},\mathcal{E}). Let {d}d\{\mathcal{E}_{d}\}_{d} be a finite rank approximating sequence for \mathcal{E} with induced connections d\nabla_{d}, and connection Laplacians Δd\Delta_{\nabla_{d}}. Let Πd:d\Pi_{d}:\mathcal{E}\to\mathcal{E}_{d} denote the fiber-wise orthogonal projection map onto d\mathcal{E}_{d}. Let 𝒳={x1,x2,}\mathcal{X}=\{x_{1},x_{2},\ldots\} and 𝒴={y1,y2,}\mathcal{Y}=\{y_{1},y_{2},\ldots\} be a pair of independent iid samplings of points on the manifold. Denote the bandwidth tt point cloud Laplacians associated to these distinct samplings by Δ^𝒳n,dt\hat{\Delta}_{\mathcal{F}^{t}_{\mathcal{X}_{n},d}} and Δ^𝒴n,dt\hat{\Delta}_{\mathcal{F}^{t}_{\mathcal{Y}_{n},d}} respectively. Let {dn}\{d_{n}\} be a diagonal sequence such that the conclusion of Theorem 2 holds for both samplings. Let Δ~n𝒳:=1tn(4πtn)m/2Δ^𝒳n,dntnΠdn\tilde{\Delta}_{n}^{\mathcal{X}}:=\frac{1}{t_{n}(4\pi t_{n})^{m/2}}\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n,d_{n}}}}\Pi_{d_{n}} with bandwidth tn=n1m+2+αt_{n}=n^{-\frac{1}{m+2+\alpha}}, α>0\alpha>0, and similar for Δ~n𝒴\tilde{\Delta}_{n}^{\mathcal{Y}}.

Let LL\in\mathbb{N} be a network depth, and σ\sigma be a fiber-wise nonlinearity that is CσC_{\sigma}-Lipschitz in the corresponding fiber norms. For bounded continuous filters g0,,gL1𝒲g_{0},\ldots,g_{L-1}\in\mathcal{W}, consider the continuous and sampled architectures:

S+1\displaystyle S^{\ell+1} :=σ(g(Δ~)S)\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}\right)S^{\ell}\right)
Sn𝒳,+1\displaystyle S_{n}^{\mathcal{X},\ell+1} :=σ(g(Δ~n𝒳)Sn𝒳,)\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}^{\mathcal{X}}_{n}\right)S_{n}^{\mathcal{X},\ell}\right)
Sn𝒴,+1\displaystyle S_{n}^{\mathcal{Y},\ell+1} :=σ(g(Δ~n𝒴)Sn𝒴,),\displaystyle:=\sigma\left(g_{\ell}\left(\tilde{\Delta}_{n}^{\mathcal{Y}}\right)S_{n}^{\mathcal{Y},\ell}\right)\,,

with initializations S0:=SS^{0}:=S and Sn𝒳,0,Sn𝒴,0:=ΠdnSS_{n}^{\mathcal{X},0},S_{n}^{\mathcal{Y},0}:=\Pi_{d_{n}}S. Under these hypotheses and notation, one may obtain a MSE convergence result:

limn𝔼[Sn𝒳,LSn𝒴,LL22]=0.\lim_{n\to\infty}\mathbb{E}\left[\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}^{2}\right]=0\,.

Further, one may derive a quantitative bound for the L2L^{2} disagreement Sn𝒳,LSn𝒴,LL2\|S_{n}^{\mathcal{X},L}-S_{n}^{{\mathcal{Y}},L}\|_{L^{2}} in terms of sample-indpendent quantities.

Proof.

One may bound:

Sn𝒳,LSn𝒴,LL22 2Sn𝒳,LSLL22+2Sn𝒴,LSLL22\ \left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}^{2}\leq\ 2\left\|S_{n}^{\mathcal{X},L}-S^{L}\right\|_{L^{2}}^{2}+2\left\|S_{n}^{\mathcal{Y},L}-S^{L}\right\|_{L^{2}}^{2}

Applying Corollary 2.1 to each sampling separately yields the MSE convergence. In particular, we have that

limn𝔼𝒳,𝒴[Ω(𝒳n,dntn,Δ^𝒳n,dntn,𝒲,σ)Ω(𝒴n,dntn,Δ^𝒴n,dntn,𝒲,σ)L22]=0.\lim_{n\to\infty}\mathbb{E}_{\mathcal{X},\mathcal{Y{}}}\left[\left\|\Omega(\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{X}_{n},d_{n}}},\mathcal{W},\sigma)-\Omega(\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}},\hat{\Delta}_{\mathcal{F}^{t_{n}}_{\mathcal{Y}_{n},d_{n}}},\mathcal{W},\sigma)\right\|_{L^{2}}^{2}\right]=0\,.

To derive a quantitative bound, introduce the per-sampling signal error and spectral filter error at level \ell by:

en,()\displaystyle e^{(-)}_{n,\ell} :=Sn(),SL2\displaystyle:=\left\|S_{n}^{(-),\ell}-S^{\ell}\right\|_{L^{2}}
δn,()\displaystyle\delta^{(-)}_{n,\ell} :=h(Δ~n())(ΠdnS)h(Δ~)SL2.\displaystyle:=\left\|h_{\ell}(\tilde{\Delta}^{(-)}_{n})(\Pi_{d_{n}}S^{\ell})-h_{\ell}(\tilde{\Delta})S^{\ell}\right\|_{L^{2}}\,.

Apply the triangle inequality and the layer-wise recursive bounds of the proof of Corollary 2.1 to establish:

Sn𝒳,LSn𝒴,LL2(r=0L1CσMr)(en,0𝒳+en,0𝒴)+q=0L1(r=q+1L1CσMr)Cσ(δn,q𝒳+δn,q𝒴).\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}\leq\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)(e^{\mathcal{X}}_{n,0}+e^{\mathcal{Y}}_{n,0})+\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}(\delta^{\mathcal{X}}_{n,q}+\delta^{\mathcal{Y}}_{n,q})\,.

The level-zero signal error is sampling independent, and bounded above by SL2\|S\|_{L^{2}}. On the other hand, we may further apply the Borel functional calculus to bound each spectral filter error by:

δn,q()\displaystyle\delta_{n,q}^{(-)} 2MqSqL2.\displaystyle\leq 2M_{q}\|S^{q}\|_{L^{2}}\,.

We hence conclude a sample-independent bound:

Sn𝒳,LSn𝒴,LL22(r=0L1CσMr)SL2+2q=0L1(r=q+1L1CσMr)CσMSqL2.\left\|S_{n}^{\mathcal{X},L}-S_{n}^{\mathcal{Y},L}\right\|_{L^{2}}\leq 2\left(\prod_{r=0}^{L-1}C_{\sigma}M_{r}\right)\|S\|_{L^{2}}+2\sum_{q=0}^{L-1}\left(\prod_{r=q+1}^{L-1}C_{\sigma}M_{r}\right)C_{\sigma}M\|S^{q}\|_{L^{2}}\,.

Comments

· 0
Be the first to comment on this paper.