Consistent Geometric Deep Learning
via Hilbert Bundles and Cellular Sheaves
Abstract
Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as HilbNets. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin & Niyogi [14] convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.
1 Introduction
Over the past few years, advances in deep learning have delivered state-of-the-art performance across many areas, driven by increasingly expressive architectures and corresponding gains in both theory and practice. A major contributor to this success, though not the only one, has been the rise of Convolutional Neural Networks (CNNs) [63]. CNNs have shown outstanding results in settings ranging from image recognition [60] to speech processing [1]. At their core, CNNs rely on filters leveraging the regular (often metric) organization of common signal types, such as spatial grids. In contrast, many modern datasets live on irregular, non-Euclidean domains, including social networks for detection and recommendation [2] or point clouds for shape segmentation [107], to name only a few. Such structured data can be represented by richer mathematical objects, among which networks and manifolds are prominent. Motivated by this, the intuition behind CNNs has been generalized to graph convolutional neural networks (GCNs) [86, 40, 59] and extended to many other settings, e.g. simplicial complexes [10, 17, 6], cell complexes [43, 16, 77], order lattices [84], and manifolds [102, 34, 87, 11, 27]. Nevertheless, existing works do not address convolutional filtering of infinite-dimensional signals on manifolds, despite such data being ubiquitous in practice, from time series and spatiotemporal fields arising in sensing, robotics, and climate science to distributional and measure-valued representations common in modern learning systems [54].
To address this gap, we adopt a bundle viewpoint. Informally, a bundle over a base manifold is a consistent assignment to each point of a space , called a fiber. A section is a map that picks an element at every point. In other words, signals supported on manifolds can be seen as sections, e.g., scalar manifold signals correspond to [102], or tangent bundle signals correspond to [11]. In this work, we develop a convolutional learning framework operating over Hilbert bundles, i.e., bundles whose fibers are (possibly) infinite-dimensional Hilbert spaces. We design a bundle-theoretic convolutional learning framework and, to make it implementable, we draw the first rigorous connection between Hilbert bundles and Hilbert Cellular Sheaves, generalized graph structures whose nodes and edges carry infinite-dimensional signals along with consistency rules.
Related Works. The connection between continuous domains, such as manifolds and bundles, and discrete structures, such as graphs and cellular sheaves, first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional Riemannian manifolds [38]. The manifold hypothesis underpins several modern spectral graph methods, e.g., nonlinear dimensionality-reduction, clustering, interpretability, and learning algorithms that exploit latent geometric structures. The renowned work of Belkin and Niyogi [14] proved that, assuming access to a finite point cloud sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.
The work in [14], and related results, e.g., [92, 93], have been used, directly or indirectly, to design learning systems over manifolds and networks [103, 67, 20, 74, 11, 56]. Despite the diversity of such systems, these models all assume finite-dimensional fibers and therefore do not directly address learning with infinite-dimensional manifold signals. The main technical reason behind this gap is the lack of an extension of the convergence result in [14] to bundles with infinite-dimensional fibers.
A line of related works of interest comes from cellular sheaf theory. Cellular sheaves are combinatorial instances of sheaves introduced in [90] and later rediscovered in [29]. In [18, 50, 7, 39, 37, 78], neural networks operating on finite-dimensional cellular sheaves over graphs, referred to as network sheaves, are presented, generalizing graph neural networks by, intuitively, replacing scalar edge weights with learned or structured matrix weights. Recently, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on network sheaves built from manifold samples. For an extended treatment of related work, see Appendix A.
Contribution. In this work, we first define a convolution operation over a Hilbert bundle through its associated connection Laplacian. This convolution extends Laplacian-based convolutions on tangent bundles [11], manifolds [103], and graphs [91, 41], as well as standard time convolutions. Using the Borel functional calculus, we then define Hilbert bundle convolutional filters for infinite-dimensional manifold signals. These filters are general and expressive, and can be instantiated through suitable spectral responses. We then introduce HilbNets, deep convolutional architectures whose layers stack Hilbert bundle filters and pointwise nonlinearities. HilbNets are continuous models and are therefore not directly implementable. To address this, we provide a principled discretization of the manifold domain by sampling points and showing that the induced structure is a Hilbert cellular sheaf over an undirected graph. The corresponding sheaf Laplacian combines scalar edge weights, obtained from the sampled base manifold, with parallel transport maps associated with the bundle geometry or learned from data. We prove that this sheaf Laplacian converges in probability to the connection Laplacian as the sampling density increases, yielding the first extension of the classical convergence result of [14] to the infinite-dimensional bundle setting. We then discretize the signals themselves to obtain an implementable architecture, show that discretized HilbNets are novel instances of network sheaf neural networks, and prove that they converge to the corresponding continuous architectures as both the manifold and signal sampling densities increase. Moreover, we show that discretized HilbNets are transferable across different samplings of the same underlying bundle, providing resolution consistency guarantees for learning. Finally, we validate HilbNets on a synthetic transport recovery task and on real-world traffic forecasting tasks, comparing them against baselines with different inductive biases in order to isolate the benefits of the bundle formulation. The potential impact of this work extends well beyond the definition of the HilbNet architecture. See Appendix B for a detailed discussion of broader impact and future directions, and Fig. 1 for an overview.
2 Preliminaries
Signals on Manifolds. Given a manifold , a vector-valued signal is a square-integrable function . Certain vector-valued signals on may possess the richer structure of a vector field, i.e., they are sections of the tangent bundle of and thus elements of . More generally, we may consider signals that are -sections of an arbitrary bundle . A bundle is called trivial when, for a generic fiber , it can be written as a product . In this setting, may be understood as the space of sections of the trivial bundle .
Consider now the case where the signal is ‘infinite-dimensional’, for instance, representing a time series recorded at each point . While this is usually considered as a function , it may instead be more richly understood as a section of a Hilbert bundle, i.e., a bundle whose fibers are Hilbert spaces. As we will see, Hilbert bundles provide a principled and versatile approach to incorporating structural properties of infinite-dimensional data.
Example 1.
In physics, Hilbert bundles often arise naturally when considering global geometric properties of quantum mechanical systems [4].
Example 2.
In information geometry, the key objects of study are manifolds given by the underlying parameters of some family of data distributions. This manifold is then equipped with a Riemannian structure by either the Otto-Wasserstein or Fisher-Rao metric, the latter of which locally recovers KL divergence. The proper analogue of the tangent bundle in this setting is a Hilbert bundle [72].
Convolution, Heat Equation, and Connection Laplacian. Geometric signal processing and deep learning [66, 21] traditionally aim to develop convolutional filters and neural networks designed to respect the underlying geometry of the signals of interest. The relevant convolutional operators can usually be realized as a connection Laplacian operator realized from a connection . For instance, for the tangent bundle over the circle , the eigenfunctions of , with the Levi-Civita connection, recover the usual Fourier basis. Similarly, the eigenfunctions of for the tangent bundle of the sphere recover spherical harmonics. Thus, convolutions with the connection Laplacian may be understood as generalized Fourier transforms in the spectral domain. In the spatial domain, it can be seen as performing a geometry-aware ‘local averaging’ of a signal over fibers. Formally, the connection Laplacian is the generator of the heat equation in ,
| (1) |
where is the distribution of heat at for at time . A subtlety of note is that for non-Euclidean spaces, there is typically no canonical identification between fibers and . Intuitively, the connection precisely encodes a globally coherent notion of transport between fibers. That is, inducing parallel transport maps for a path , allowing us to compare elements across fibers along this path. More formally, the connection is used to define a first-order ODE whose solution is given by parallel transport (see Appendix G.1.3). The connection Laplacian is then the self-adjoint operator , now more clearly understandable as a ‘local weighted average’ over fibers with ‘weights’ corresponding to our choice of parallel transport. A more rigorous introduction to the relevant mathematical background is provided in Appendix G.
3 Hilbert Bundle Filters and Neural Networks
In this section, we develop a convolutional learning framework for infinite-dimensional data, such as time series or probability distributions, indexed by a manifold . Core objects are Hilbert bundles.
Hilbert Bundles. Given a closed Riemannian manifold , a Hilbert bundle over is a bundle whose potentially infinite-dimensional fibers are separable Hilbert spaces over . The assumption of real, instead of complex, Hilbert spaces is not essential to our analysis, and is made only for the sake of exposition. As mentioned in Section 2, a Hilbert Bundle signal is then an -section . Integration of sections in this setting should be understood in the Bochner integral sense, a generalized notion of integration for functions whose values lie in a Hilbert space rather than in . In finite dimensions, it reduces to the standard Lebesgue integral. Given fibers and of the Hilbert bundle , we consider unitary parallel transport operators . As before, a globally compatible collection of such transport operators determines a connection , with the subtlety that derivatives of sections must now be understood in the Fréchet sense. Intuitively, Fréchet differentiability is the infinite-dimensional analogue of ordinary differentiability: it asks that a section admit a best linear approximation under small perturbations, but where the linear approximation acts between Hilbert spaces. We therefore refer to this construction as a Fréchet connection, which recovers the usual notion of connection and covariant derivative when restricted to finite-dimensional bundles. As before, we obtain a self-adjoint operator on . Unlike the finite-dimensional case, however, this operator need not be compact and thus need not possess a discrete spectrum. As such, care must be taken when adapting classical arguments that involve spectral properties of the Laplacian to the Hilbert-bundle setting. Formal definitions of Hilbert bundles and Fréchet connections are provided in Appendix G. For a triple , where is a choice of Fréchet connection on the Hilbert bundle over the manifold , we now wish to construct a general notion of a ‘filtering’ operation using the connection Laplacian . In finite-dimensional or compact settings, filters are often defined by applying a function directly to the eigenvalues of the Laplacian. In our setting, the appropriate analogue of eigenvalue-by-eigenvalue filtering is furnished by the Borel functional calculus, which allows us to apply a filter to a self-adjoint operator by instead integrating over its spectral measure. See Appendix G.4 for details.
Definition 1 (Hilbert bundle convolutional filter).
A convolutional filter is specified by a bounded compactly supported Borel function . The filtering of a signal is then its convolution with defined as is the bounded linear operator obtained by applying to through the Borel functional calculus.
In this sense, is the learnable frequency response, as in spectral graph neural filters [41], except we now use the spectral measure of the connection Laplacian acting on Hilbert bundle-valued signals.
Definition 2 (Hilbert bundle convolutional neural network).
Let be a Hilbert bundle. A Hilbert bundle convolutional neural network, or HilbNet, is specified by a filter bank with , and a Lipschitz continuous nonlinear activation . Given input signals , the -layer network output is obtained by the recursion
| (2) |
where is applied pointwise in each fiber.
We concisely denote a HilbNet with . Similarly to the finite-dimensional case, a nonlinear activation with extends to an operator on the section by simply picking a basis and then applying to each coordinate with respect to the chosen basis. It is straightforward to check that, for each , the layer signal remains an section.
4 Discretized HilbNets via Cellular Sheaves
HilbNets are continuous architectures that cannot be implemented directly in practice. Moreover, we typically do not have access to the true bundle and connection structure, but only to a point cloud or graph sampled from the underlying manifold , together with samples of the signal at each point or node. In this section, we first analyze the Hilbert cellular sheaf induced by spatial, i.e., manifold-level, sampling. We then further discretize the fibers, i.e., the signal domain itself, obtaining a finite rank network sheaf. This two-stage sampling is the basis of our consistency theory presented in Section 5. It allows us to prove that the fully discrete (thus, implementable) Laplacian and HilbNet converge to their infinite-dimensional counterparts, and hence that learning is consistent across scales.
Manifold Sampling. A generalized viewpoint to the theory of bundles is given by the language of sheaves, mathematical structures initially introduced by Jean Leray while a prisoner of war [65]. The functoriality of sheaves lends them particularly well to the type of principled discretization of geometric structures that we are interested in. In particular, we consider a Hilbert-space valued version of cellular sheaves on graphs, as introduced in [44]. Intuitively, they can be understood as generalized graph structures with signals valued in Hilbert spaces along with edge-wise coupling rules. For a more thorough introduction to cellular sheaves, see Appendix G.5.
In this work, our primary interest is in Hilbert sheaves that represent discretizations of the structure of a Hilbert bundle over a manifold. In particular, we desire a spatial discretization such that we can recover an appropriate discrete analogueof the Hilbert bundle’s connection Laplacian. Formally, given an iid random sample from the uniform distribution (see Def. 23), we have the following.
Definition 3 (Hilbert Cellular Sheaf from a Hilbert Bundle).
For a given Hilbert bundle with sampled points , fix a geodesic between and , for all . Further, let denote the midpoint of this geodesic. Consider the graph with an undirected edge between and , for each . The associated Hilbert cellular sheaf on with bandwidth parameter is given by the following assignments:
-
•
The Hilbert space for each , referred to as the node stalk over .
-
•
The Hilbert space for each , referred to as the edge stalk over .
-
•
For each edge with bounding vertices , a pair of bounded linear restriction maps
(3) where , with the geodesic distance on , and denotes the unitary parallel transport map on between and .
For the sake of exposition, the choice of sample and corresponding geodesic paths will often be suppressed, so our parallel transports will be denoted as . Also, note that for and , we assume each additional point is again sampled iid from the uniform distribution on . For the categorically-minded reader, we remark that our sheaf is constructed such that refining our sample then leads to a subfunctor . For the Hilbert sheaf on the graph , a signal is an element of the Hilbert space
| (4) |
Example 4.
If encodes univariate spatiotemporal data, each node stalk can be chosen as . Then so a signal assigns a full time series to every node, recovering the usual notion of a node-time graph signal.
Example 5.
In one dimension, a probability distribution can be represented by its quantile function , and the Wasserstein distance becomes the distance between quantiles [99]. Thus, by choosing node stalks a signal assigns a full probability distribution to each graph node, recovering the distributional graph-signal setting of [57, 112].
Finally, analogous to the construction of the connection Laplacian , we may construct the Hilbert sheaf Laplacian. Further details, such as self-adjointness, are discussed in Appendix G.5.
Definition 4 (Hilbert Sheaf Laplacian).
Let be the Hilbert sheaf on the graph induced by Def, 3. Fix an orientation for each edge . The Hilbert sheaf Laplacian is the bounded linear operator
| (5) |
defined, for a signal and at a node , by
| (6) |
where denotes the other endpoint of , and is the adjoint of the restriction map .
Intuitively, measures how much a signal fails to be locally consistent across edges: before comparing and , both values are mapped into the common edge stalk by the restriction maps. Thus, it is a broad generalization of a graph Laplacian, with restriction maps replacing scalar edge weights. The Hilbert sheaf Laplacian is a self-adjoint bounded linear operator. Once the manifold is sampled and the induced sheaf Laplacian is computed, space-discretized HilbNets, which are still not implementable due to the infinite-dimensional signals, are simply given by Def, 2 with the connection Laplacian of replaced by the sheaf Laplacian of , i.e., by .
Signal Sampling. Hilbert cellular sheaves are the structures that arise when we sample our base manifold but faithfully record the potentially infinite-dimensional signal in each fiber. In practice, we typically only have access to a sampled or compressed version of the signal as well. For instance, when considering a timeseries , we may use the orthogonal Fourier basis and then record a compressed representation with respect to this basis i.e. for some . We can consider fiber-wise orthogonal projections with respect to any chosen basis in the Hilbert bundle setting as a principled approach to discretizing Hilbert bundle signals.
Proposition 1.
Let be a Hilbert bundle, with strictly infinite-dimensional generic Hilbert-space fiber . Fix an orthogonal basis of and let . Then there exists a smooth map of bundles
| (7) |
where is a -dimensional vector bundle with generic fiber and at each , recovers the usual orthogonal projection map. See Appendix H.4 for details.
Applying Proposition 1 to , we obtain , to which we may apply the spatial discretization of Def.3 to construct the cellular sheaf with -dimensional stalks. We refer to as a network sheaf. The signals on this sheaf are then sampled Hilbert bundle signals, i.e., we can discretize as a -dimensional vector , stacking the -dimensional orthogonal projections over the sampled locations, with respect to the chosen basis . In this case, the restriction maps can be written as matrices, thus the sheaf Laplacian becomes a block matrix whose -block maps the discretized stalk over to the discretized stalk over and is given by
| (8) |
where , and denotes the restriction of the parallel transport map from to to the corresponding -dimensional subbundles in the image of . We may thus use this Laplacian to build an implementable sheaf convolutional architecture as follows.
Definition 5 (-Hilbert bundle convolutional neural network).
Let be a Hilbert bundle, with generic Hilbert-space fiber and corresponding basis . A -Hilbert bundle convolutional neural network, or -HilbNet, is specified by a filter bank with , a Lipschitz continuous nonlinear activation , the choice of a -dimensional subbasis of and sample . Given input sampled signals , the -layer network output is obtained by the recursion
| (9) |
Discretized HilbNets are fully implementable and can be compactly written using Def. 2 with the connection Laplacian of replaced by the sheaf Laplacian of , i.e., by .
Example 6.
In the case that we consider our filter bank to consist of order polynomials, the -HilbNet can be written as a novel variant of sheaf neural networks [50, 11] given by
| (10) |
where the matrices , and , with collect the sampled signals and the learnable filter weights at each layer, respectively.
5 Theoretical Convergence Guarantees
Our main result may be understood as a far-reaching generalization of the convergence result of Belkin and Niyogi [14]. Consider a random sample and the corresponding geometric graph . It is established in [14] that as sampling density increases, the weighted graph Laplacian converges to the manifold Laplace-Beltrami operator in probability. We analogously show that the Hilbert sheaf Laplacian over converges to , thus recovering the results of [14] as the special case . Our proof, presented in Appendix H, is inspired by the strategy of [14] but with the necessary non-trivial modifications to accommodate the simultaneous generalization to cellular sheaves instead of graphs and to infinite-dimensional Hilbert-spaces. In order to state our results, we require the following intermediary operator.
Definition 6.
(Point-Cloud Extension of Sheaf Laplacian) Let be a Hilbert bundle and consider a sample . Then the corresponding Hilbert sheaf Laplacian may be extended to the point-cloud Laplacian , an operator on via
| (11) |
As such, we are able to consider the sheaf-level and bundle-level Laplacians as operators on the same space through this extension. In this setting, we then have the following convergence result.
Theorem 1.
(Convergence of Hilbert Sheaf Laplacian) Let be a -dimensional closed Riemannian manifold. Further, let be a Hilbert bundle and associated connection Laplacian . Fix a section . Consider a random sample . Let be the induced Hilbert cellular sheaf with bandwidth . Then we have, for any ,
| (A) |
with bandwidth , . Further, if , we have
| (B) |
Our framework may be seen as concurrently generalizing the convergence results for the weighted graph Laplacian of [14] as well as the graph connection Laplacian of [93] to allow for arbitrary bundles, with potentially infinite-dimensional fibers, equipped with an arbitrary choice of connection. Such convergence results have previously served as the basis of transferability and robustness results in geometric deep learning [101, 104, 67], as well as to justify the development of numerous Laplacian-based manifold learning techniques [13, 28, 108, 97, 85]. We may likewise develop generalizations of these results for implementable sheaf Laplacians by discretizing in the signal-domain.
Finite Rank Convergence. Consider a signal that has been sampled to as per Proposition 1. We then have the following theoretical guarantee, which formalizes the intuitive notion that the fully discretized sheaf Laplacian converges to true connection Laplacian as we take an increasingly refined sample of both the underlying manifold and the signal.
Theorem 2 (Finite-Rank Approximation).
Consider the setting of Theorem 1 with a section , for a strictly infinite-dimensional Hilbert bundle. Then there exists a sequence of finite rank approximating sheaves such that
| (12) |
with bandwidth , .
While this result, as described in Appendix B, can pave the way for new developments of Laplacian-based manifold learning, we here restrict our focus to its consequences for -HilbNets.
Corollary 2.1 (Convergence in Architecture).
Under the hypotheses of Theorem 2, let be the required sequence. Fix a fiber-wise nonlinearity that is -Lipschitz in the corresponding fiber norms and choice of filter bank . Then, the output of the discrete -HilbNet converges to the output of the continuous HilbNet architecture in the sense that
| (13) |
as the sampling density .
Corollary 2.2.
(Transferability) Let and be independent sequences of random samples of . Let be a sequence such that the conclusion of Theorem 2 holds for both samplings. For any fiber-wise nonlinearity that is -Lipschitz in the corresponding fiber norms and any filter bank , we then have that,
| (14) |
Further, one may derive a sample-independent quantitative bound for the disagreement . See Appendix H for details.
By these results, we may understand the -HilbNets as the principled discretization of continuous HilbNets. These may also be understood as robustness results for -HilbNets, as they establish that the architecture is scale-consistent and is stable against resampling of the base manifold. Notably, by the generality of HilbNets, most existing geometric convolutional architectures can be understood as instances of HilbNets for a particular choice of bundle and connection. As such, our results may also be seen as extending the transferability guarantees of [101, 104, 67] to a larger class of architectures and data modalities. See Appendix C for further discussion.
| Free | Circulant | Frozen identity (GCN) | ||||
|---|---|---|---|---|---|---|
| Empirical | Theory | Empirical | Theory | Empirical | Theory | |
6 Experimental Results
A key practical advantage of the -HilbNets architecture in comparison to existing approaches for processing graph signals is our use of parallel transport, which in practice can be known or learned. For instance, these transport operators allow us to incorporate principled signal-level geometric priors in concert with the spatial priors of existing spatiotemporal GCNs. This is well-aligned with the thesis of geometric deep learning that the principled incorporation of geometric priors improves performance, particularly in the low-data or small-model regimes. For further discussion on strategies for either hand-crafting or learning parallel transport operators, see Appendix D. Here, we first validate our setup for a synthetic dataset realized from discretizing a known Hilbert bundle in information geometry, and then consider performance on real-world spatiotemporal graph benchmarks based upon traffic forecasting. In all the experiments, we use the polynomial -HilbNet from (10) and learned transport maps.
Synthetic Experiments. We first consider a task where, for a known Hilbert bundle and connection, we train a discretized HilbNet to predict the true parallel transport operators. Following [72], the base manifold is equipped with the Otto-Wasserstein metric, each parameterizing a density . The ambient fiber is genuinely infinite-dimensional; the computational fiber is the Otto-velocity image of covariance perturbations, a sub-bundle whose fibers are already finite-dimensional with , and on which the Levi-Civita transports admit a closed-form. We sample points, build a NN graph () under , and assemble the network sheaf and its Laplacian from Def, 3. We consider three transport parametrizations from Appendix D ( averaged over 3 seeds): free (Householder), circulant, and frozen identity (a usual GCN [59]), to recover the Levi-Civita transports in Cholesky-rescaled coordinates. As the reader can notice in Table 4, the free class recovers to numerical precision , while each restricted class converges to its analytical Frobenius-projection plateau to within . This confirms that the transport hypothesis class constrains the per-edge restriction maps of in a quantitatively predictable way. More experiments and details are in Appendix F.1
Traffic Forecasting. We evaluate HilbNets on real-world spatiotemporal traffic-speed forecasting, where each node of a road-network graph carries a time-series fiber with observed time steps. This is a natural instance of the Hilbert-bundle framework: the base graph encodes spatial proximity among sensors, while the edgewise transports from Appendix D control how temporal fibers are aligned before filtering. We test on two standard benchmarks, METR-LA [69] and PEMS-BAY [69], predicting future speeds at horizons , , and . We compare the same three HilbNet transport classes, frozen identity (a GCN), circulant, and free , against a fiber-only MLP and a spatiotemporal graph baseline obtained by stacking GCN layers with one-dimensional convolutional layers processing the temporal dimension, all sharing the same polynomial sheaf filter order, readout, and forecasting loss. Full experimental details are given in Appendix F.2. Table 2 reports MAE, RMSE, and MAPE (mean std over five seeds). On both datasets, learning non-trivial transports consistently improves over frozen identity at all horizons, confirming that the sheaf structure helps beyond the usual graph structure. The free class achieves the best overall accuracy, but the circulant variant is competitive while using roughly one tenth of the transport parameters, supporting the use of structured, physically motivated transport priors for spatiotemporal data. This confirms that geometric inductive biases can lead to better performance in low-data regimes or comparable performance in normal regimes with substantially fewer parameters.
7 Conclusions
We introduced a novel convolutional learning framework for infinite-dimensional signals over a manifold using Hilbert bundles, a setting that concurrently unifies and generalizes existing approaches. It allows us to consider arbitrary connection Laplacians, a more general class of filters via the Borel calculus, and thus applications of the resulting filters to potentially infinite-dimensional signals. We defined HilbNets as stacks of Hilbert bundle filters and pointwise non-linearity. We consequently introduced a practically implementable (n,d)-HilbNet via the theory of Hilbert cellular sheaves, and proved that this discretized architecture converges to the continuous architecture in the limit. Notably, our convergence in architecture is derived from a novel extension of the Laplacian convergence result of [14] to the setting of Hilbert sheaves and Hilbert bundles, and we believe this result will be of independent interest to the broader machine learning community. Lastly, we verified the benefits of integrating domain-specific geometric priors through experiments with discretized HilbNets on synthetic and real-world data. Overall, we envision the prospective impact of our contributions as two-fold: the HilbNet framework allows for the principled development of domain-specific architectures through appropriate choices of connection, filter bank, and manifold- and signal-alignment measures, while our Hilbert Laplacian convergence theorem lays the theoretical groundwork for the development of Laplacian-based manifold-theoretic techniques in the setting of infinite-dimensional signals, spanning from mechanistic interpretability to self-supervised learning methods. A more detailed discussion on broader impact and limitations is presented in Appendix B.
References
- [1] (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In 2012 International Conference on Acoustics, Speech and Signal Processing (ICASSP), External Links: Document Cited by: §1.
- [2] (2020) Machine learning in social networks: embedding nodes, edges, communities, and graphs. Springer Nature. Cited by: §1.
- [3] (1995) A primer of nonlinear analysis. Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, UK. External Links: ISBN 9780521454057 Cited by: §H.1.
- [4] (1991-05) Geometric quantization of chern–simons gauge theory. Journal of Differential Geometry 33 (3), pp. 787–902. External Links: Document Cited by: Example 1.
- [5] (2025) Bundle neural network for message diffusion on graphs. In The Thirteenth International Conference on Learning Representations, External Links: Link Cited by: Appendix A, §C.1.1.
- [6] (2020) Topological signal processing over simplicial complexes. IEEE Trans. on Signal Processing 68, pp. 2992–3007. Cited by: §1.
- [7] (2022) Sheaf neural networks with connection laplacians. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1.
- [8] (2011) Heat equations on vector bundles—application to color image regularization. Journal of Mathematical Imaging and Vision 41 (1-2), pp. 59–85. External Links: Document Cited by: Example 3.
- [9] (2023) Tangent bundle filters and neural networks: from manifolds to cellular sheaves and back. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. Cited by: Appendix A, §C.1.1, §1.
- [10] (2024) Generalized simplicial attention neural networks. IEEE Transactions on Signal and Information Processing over Networks 10, pp. 833–850. Cited by: §1.
- [11] (2024) Tangent bundle convolutional learning: from manifolds to cellular sheaves and back. IEEE Transactions on Signal Processing. Cited by: Appendix A, Appendix A, Appendix D, §1, §1, §1, §1, §1, Example 6.
- [12] (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision 61 (2), pp. 139–157. External Links: Document Cited by: Appendix D.
- [13] (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems 14. Cited by: Appendix B, Appendix B, §5.
- [14] (2008) Towards a theoretical foundation for laplacian-based manifold methods. Journal of Computer and System Sciences 74 (8), pp. 1289–1308. Note: Learning Theory 2005 External Links: ISSN 0022-0000, Document, Link Cited by: Appendix A, Appendix A, Appendix B, Appendix B, Appendix B, §C.1.1, 2nd item, 3rd item, §G.6, §H.2, §1, §1, §1, §5, §5, §7.
- [15] (1992) Heat kernels and dirac operators. Springer Berlin, Heidelberg. External Links: Document Cited by: §G.3, §G.3, Remark 7.
- [16] (2021) Weisfeiler and lehman go cellular: cw networks. In Advances in Neural Information Processing Systems, Vol. 34, pp. 2625–2640. Cited by: §1.
- [17] (2021) Weisfeiler and Lehman go topological: message passing simplicial networks. In ICLR 2021 Workshop on Geometrical and Topological Representation Learning, Cited by: §1.
- [18] (2022) Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in gnns. arXiv. External Links: Document Cited by: Appendix A, §C.1.1, Appendix D, §1.
- [19] (2023) Spherical fourier neural operators: learning stable dynamics on the sphere. In International conference on machine learning, pp. 2806–2823. Cited by: Appendix A.
- [20] (2020) Matérn gaussian processes on Riemannian manifolds. In Advances in Neural Information Processing Systems, Vol. 33. External Links: Link, 2006.10160 Cited by: Appendix A, §1.
- [21] (2021) Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478. Cited by: §2.
- [22] (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: Appendix A.
- [23] (1992) Hilbert complexes. Journal of Functional Analysis 108, pp. 88–132. Cited by: item 2.
- [24] (2021) Asymptotics for spherical functional autoregressions. The Annals of Statistics 49 (1), pp. 346–369. External Links: Document Cited by: Appendix A.
- [25] (1967) Differential calculus. Princeton University Press, Princeton, NJ. Cited by: §H.2.
- [26] (2024) Learning neural operators on riemannian manifolds. National Science Open 3 (6), pp. 20240001. Cited by: Appendix A.
- [27] (2019) A general theory of equivariant cnns on homogeneous spaces. Advances in neural information processing systems 32. Cited by: §1.
- [28] (2006) Diffusion maps. Applied and Computational Harmonic Analysis 21 (1), pp. 5–30. External Links: Document, Link Cited by: Appendix B, Appendix B, §5.
- [29] (2014) Sheaves, cosheaves and applications. University of Pennsylvania. Cited by: Appendix A, §G.5, §1.
- [30] (2025) The relativity of causal knowledge. In The 41st Conference on Uncertainty in Artificial Intelligence, External Links: Link Cited by: Appendix A.
- [31] (2018) Principal component analysis for functional data on Riemannian manifolds and spheres. The Annals of Statistics 46 (6B), pp. 3309–3338. External Links: Document, Link Cited by: Appendix A.
- [32] (2021) Equivariant contrastive learning. arXiv preprint arXiv:2111.00899. Cited by: Appendix B.
- [33] (2022) Riemannian score-based generative modelling. Advances in neural information processing systems 35, pp. 2406–2422. Cited by: Appendix A.
- [34] (2020) Gauge equivariant mesh cnns: anisotropic convolutions on geometric graphs. arXiv preprint arXiv:2003.05425. Cited by: §1.
- [35] (2025) Learning the structure of connection graphs. arXiv preprint arXiv:2510.11245. Cited by: Appendix A.
- [36] (1992) Riemannian geometry. Mathematics: Theory & Applications, Birkhäuser, Boston. External Links: ISBN 978-0817634902 Cited by: 1st item, §H.2.
- [37] (2023) Sheaf hypergraph networks. Advances in Neural Information Processing Systems 36, pp. 12087–12099. Cited by: Appendix A, §1.
- [38] (2016) Testing the manifold hypothesis. Journal of the American Mathematical Society 29 (4), pp. 983–1049. Cited by: §1.
- [39] (2025) Sheaves reloaded: a directional awakening. arXiv preprint arXiv:2506.02842. Cited by: Appendix A, §1.
- [40] (2018) Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing 67 (4), pp. 1034–1049. Cited by: §1.
- [41] (2020-11) Graphs, convolutions, and neural networks: from graph filters to graph neural networks. IEEE Signal Processing Magazine 37, pp. 128–138. External Links: Document Cited by: §1, §3.
- [42] (2022) Cellular sheaves of lattices and the tarski laplacian. Homology, Homotopy and Applications 24 (1), pp. 325–345. Cited by: Appendix A.
- [43] (2023) Cell attention networks. In 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.
- [44] (2025) Cellular sheaves of hilbert spaces. Ph.D. Thesis, University of Pennsylvania. Cited by: §G.5, §G.5, §4.
- [45] (2017) A time-vertex signal processing framework: scalable processing and meaningful representations for time-series on graphs. IEEE Transactions on Signal Processing 66 (3), pp. 817–829. Cited by: §C.1.3.
- [46] (2009) Heat kernel and analysis on manifolds. AMS/IP Studies in Advanced Mathematics, Vol. 47, American Mathematical Society, Providence, RI. Cited by: §C.1.3.
- [47] (2025) Learning network sheaves for ai-native semantic communication. arXiv preprint arXiv:2512.03248. Cited by: Appendix A.
- [48] (1955) A general theory of fibre spaces with structure sheaf. University of Kansas, Department of Mathematics. Cited by: Appendix A.
- [49] (2025) Distributed multi-agent coordination over cellular sheaves. arXiv preprint arXiv:2504.02049. Cited by: Appendix A.
- [50] (2020) Sheaf neural networks. arXiv. External Links: Document, Link Cited by: Appendix A, §C.1.1, §1, Example 6.
- [51] (2019) Learning sheaf laplacians from smooth signals. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 5446–5450. External Links: Document Cited by: Appendix A.
- [52] (2019-12-01) Toward a spectral theory of cellular sheaves. Journal of Applied and Computational Topology 3 (4), pp. 315–358. External Links: ISSN 2367-1734, Document, Link Cited by: Appendix A, §G.5.
- [53] (2021) Opinion dynamics on discourse sheaves. SIAM Journal on Applied Mathematics 81 (5), pp. 2033–2060. External Links: Document, https://doi.org/10.1137/20M1341088 Cited by: Appendix A, §G.5.
- [54] (2024) Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1.
- [55] (2026) Semantic tube prediction: beating llm data efficiency with jepa. External Links: 2602.22617, Link Cited by: Appendix B.
- [56] (2021) Vector-valued gaussian processes on riemannian manifolds via gauge independent projected kernels. Advances in Neural Information Processing Systems 34, pp. 17160–17169. Cited by: Appendix A, §1.
- [57] (2025-07) Graph distributional signals for regularization in graph neural networks. IEEE Transactions on Signal and Information Processing over Networks 11, pp. 670–682. External Links: Document Cited by: Example 5.
- [58] (2024) Solving forward and inverse pde problems on unknown manifolds via physics-informed neural operators. arXiv preprint arXiv:2407.05477. Cited by: Appendix A.
- [59] (2017) Semi-Supervised Classification with Graph Convolutional Networks. In Proc. of the 5th International Conference on Learning Representations (ICLR), External Links: Link Cited by: §C.1.1, §1, §6.
- [60] (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 25, pp. . Cited by: §1.
- [61] (1965) The homotopy type of the unitary group of hilbert space. Topology 3 (1), pp. 19–30. External Links: Document Cited by: §H.4.
- [62] (1995) Differential and riemannian manifolds. 3 edition, Graduate Texts in Mathematics, Vol. 160, Springer, New York, NY. External Links: ISBN 978-0-387-94338-1, Document, ISSN 0072-5285 Cited by: §G.1.4.
- [63] (1998) Gradient-based learning applied to document recognition. Proc. of the IEEE 86 (11), pp. 2278–2324. Cited by: §1.
- [64] (1991) Probability in banach spaces: isoperimetry and processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Vol. 23, Springer-Verlag, Berlin. Cited by: §H.1.
- [65] (1946) L’anneau d’homologie d’une représentation. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 222, pp. 1366–1368 (French). Cited by: Appendix A, §4.
- [66] (2023) Graph signal processing: history, development, impact, and outlook. IEEE Signal Processing Magazine 40 (4), pp. 49–60. Cited by: §2.
- [67] (2021) Transferability of spectral graph convolutional neural networks. Journal of Machine Learning Research 22 (272), pp. 1–59. Cited by: Appendix A, §1, §5, §5.
- [68] (2025) Learning from frustration: torsor cnns on graphs. In Proceedings of the Workshop on Symmetry and Geometry in Neural Representations at NeurIPS 2025, Note: Workshop paper External Links: Link Cited by: §C.1.2, Appendix D.
- [69] (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In International Conference on Learning Representations (ICLR), External Links: Link Cited by: §F.2.1, §F.2.1, §F.2.3, Table 2, Table 2, §6.
- [70] (2016) Smooth principal component analysis over two-dimensional manifolds with an application to neuroimaging. Cited by: Appendix A.
- [71] (2023) Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), External Links: Document Cited by: §F.2.3, Table 2, Table 2.
- [72] (2018-12) Wasserstein riemannian geometry of gaussian densities. Information Geometry 1 (2), pp. 137–179. External Links: ISSN 2511-249X, Document, Link Cited by: §6, Example 2.
- [73] (2026) Over-squashing in spatiotemporal graph neural networks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, External Links: Link Cited by: §C.1.3.
- [74] (2024) The GeometricKernels package: heat and matérn kernels for geometric learning on manifolds, meshes, and graphs. arXiv:2407.08086. External Links: Link Cited by: Appendix A, §1.
- [75] (2009-Sept) Equivalences of smooth and continuous principal bundles with infinite-dimensional structure group. advg 9 (4), pp. 605–626. External Links: ISSN 1615-715X, Link, Document Cited by: item 3, §H.4.
- [76] (2007) Lectures on the geometry of manifolds. 2 edition, World Scientific, Singapore. External Links: ISBN 9789812708533 Cited by: §G.2.
- [77] (2025) TopoTune: a framework for generalized combinatorial complex neural networks. External Links: Link Cited by: §1.
- [78] (2026) Sheaf neural networks on spd manifolds: second-order geometric representation learning. arXiv preprint arXiv:2604.20308. Cited by: Appendix A, §1.
- [79] (2006) Riemannian geometry. 2 edition, Graduate Texts in Mathematics, Vol. 171, Springer, New York. External Links: ISBN 978-0-387-29403-2, Document Cited by: §G.2.
- [80] (1991) Inequalities for distributions of sums of independent random vectors and their application to estimating a density. Theory of Probability & Its Applications 35 (3), pp. 605–607. External Links: Document Cited by: §H.1, §H.2.
- [81] (2026) Size transferability of graph transformers with convolutional positional encodings. External Links: 2602.15239, Link Cited by: Appendix B.
- [82] (1972) Functional analysis. Methods of Modern Mathematical Physics, Vol. 1, Academic Press, New York. External Links: ISBN 0125850018 9780125850018, Link Cited by: §G.4, §H.6.
- [83] (2022) Diffusion of information on networked lattices by gossip. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 5946–5952. Cited by: Appendix A.
- [84] (2020) Multidimensional persistence module classification via lattice-theoretic convolutions. In NeurIPS Workshop: TDA & Beyond, Cited by: §1.
- [85] (2007) Laplace-beltrami eigenfunctions for deformation invariant shape representation. In Proceedings of the Symposium on Geometry Processing (SGP), A. Belyaev and M. Garland (Eds.), pp. 225–233. External Links: Document, ISBN 978-3-905673-46-3, ISSN 1727-8384 Cited by: §5.
- [86] (2008) The graph neural network model. IEEE Trans. on neural networks 20 (1), pp. 61–80. Cited by: §1.
- [87] (2018) Parallel transport convolution: a new tool for convolutional neural networks on manifolds. arXiv preprint arXiv:1805.07857. Cited by: §1.
- [88] (1955) Faisceaux algébriques cohérents. Annals of Mathematics, pp. 197–278. Cited by: Appendix A.
- [89] (2022) Intrinsic riemannian functional data analysis for sparse longitudinal observations. The Annals of Statistics 50 (3), pp. 1696–1721. Cited by: Appendix A.
- [90] (1985) A cellular description of the derived category of a stratified space. Brown University. Cited by: Appendix A, §1.
- [91] (2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30 (3), pp. 83–98. Cited by: §1.
- [92] (2012) Vector diffusion maps and the connection laplacian. Communications on Pure and Applied Mathematics 65 (8), pp. 1067–1144. External Links: Document Cited by: Appendix A, Appendix B, §C.1.1, §1.
- [93] (2017) Spectral convergence of the connection laplacian from random samples. Information and Inference: A Journal of the IMA 6 (1), pp. 58–123. External Links: Document, Link Cited by: Appendix A, Appendix B, Appendix B, §1, §5.
- [94] (2025) Change point detection for functional autoregressive processes on the sphere. arXiv preprint arXiv:2512.03255. Cited by: Appendix A.
- [95] (2024) Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, pp. 127063. Cited by: Appendix B.
- [96] (2022) Large-scale representation learning on graphs via bootstrapping. In International Conference on Learning Representations, External Links: Link Cited by: Appendix B.
- [97] (2008) Spectral geometry processing with manifold harmonics. Computer Graphics Forum 27 (2), pp. 251–260. External Links: Document, ISSN 1467-8659 Cited by: §5.
- [98] (2017) Attention is all you need. In Conference on Neural Information Processing Systems (NeurIPS), pp. 6000–6010. External Links: Link Cited by: Appendix B.
- [99] (2003) Topics in optimal transportation. Graduate Studies in Mathematics, Vol. 58, American Mathematical Society, Providence, RI. External Links: ISBN 978-0821833124 Cited by: Example 5.
- [100] (2008) Consistency of spectral clustering. The Annals of Statistics, pp. 555–586. Cited by: Appendix B.
- [101] (2021) Stability of manifold neural networks to deformations. arXiv preprint arXiv:2106.03725. Cited by: §C.1.1, §5, §5.
- [102] (2021) Stability of neural networks on riemannian manifolds. In 2021 29th European Signal Processing Conference (EUSIPCO), pp. 1845–1849. Cited by: §1, §1.
- [103] (2022) Convolutional neural networks on manifolds: from graphs and back. arXiv:2210.00376. Cited by: Appendix A, §1, §1.
- [104] (2024) Geometric graph filters and neural networks: limit properties and discriminability trade-offs. IEEE Transactions on Signal Processing 72 (), pp. 2244–2259. External Links: Document Cited by: §5, §5.
- [105] (2026) Equivariant and coordinate independent convolutional networks: a gauge field theory of neural networks. Progress in Data Science, Vol. 1, World Scientific Publishing Company. Note: Monograph on equivariant and gauge-theoretic neural network architectures and their coordinate-independent generalizations External Links: ISBN 9789819806621 Cited by: §C.1.2.
- [106] (2024) Neural manifold operators for learning the evolution of physical dynamics. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3356–3366. Cited by: Appendix A.
- [107] (2020) Linking points with labels in 3d: a review of point cloud semantic segmentation. IEEE Geoscience and Remote Sensing Magazine 8 (4), pp. 38–59. Cited by: §1.
- [108] (2024) Spherical analysis of learning nonlinear functionals. arXiv preprint arXiv:2410.01047. Cited by: §5.
- [109] (2020) Graph contrastive learning with augmentations. Advances in neural information processing systems 33, pp. 5812–5823. Cited by: Appendix B.
- [110] (2024) Self-supervised transformation learning for equivariant representations. Advances in Neural Information Processing Systems 37, pp. 83068–83090. Cited by: Appendix B.
- [111] (2025) Towards variational flow matching on general geometries. In ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, External Links: Link Cited by: Appendix A.
- [112] (2026) Graph distribution-valued signals: a wasserstein space perspective. External Links: 2509.25802, Link Cited by: Example 5.
Appendix Contents
| A Extended Related Works .................................................................................................................. | A |
| B Broader Impact, Future Directions and Limitations .................................................................................................................. | B |
| C Existing Convolutional Architectures as Actualizations of HilbNets .................................................................................................................. | C |
| C.1 Universality of HilbNets .................................................................................................................. | C.1 |
| C.1.1 CNNs, GNNs, and Sheaf NNs .................................................................................................................. | C.1.1 |
| C.1.2 Equivariant CNNs and GNNs .................................................................................................................. | C.1.2 |
| C.1.3 Spatio-Temporal GNNs .................................................................................................................. | C.1.3 |
| D Practical Implementations of Parallel Transport .................................................................................................................. | D |
| E Parallel Transport Parametrizations .................................................................................................................. | E |
| E.1 From Bundle Transports to Network-Sheaf Restrictions .................................................................................................................. | E.1 |
| E.2 Transport Hypothesis Classes .................................................................................................................. | E.2 |
| E.3 Parameter Counts .................................................................................................................. | E.3 |
| F Additional Experimental Details .................................................................................................................. | F |
| F.1 Synthetic Experiments: The Statistical Bundle over Centered Gaussians .................................................................................................................. | F.1 |
| F.1.1 The Bundle .................................................................................................................. | F.1.1 |
| F.1.2 Levi-Civita Connection and Closed-Form Parallel Transport .................................................................................................................. | F.1.2 |
| F.1.3 Cholesky Rescaling .................................................................................................................. | F.1.3 |
| F.1.4 Sample Construction .................................................................................................................. | F.1.4 |
| F.1.5 Spectral Stability under Sampling Density Increase .................................................................................................................. | F.1.5 |
| F.1.6 Hyperparameters .................................................................................................................. | F.1.6 |
| F.2 Traffic Forecasting: Experimental Details .................................................................................................................. | F.2 |
| F.2.1 Datasets .................................................................................................................. | F.2.1 |
| F.2.2 Task Formulation .................................................................................................................. | F.2.2 |
| F.2.3 Model Variants and Baselines .................................................................................................................. | F.2.3 |
| F.2.4 Detailed Results .................................................................................................................. | F.2.4 |
| F.2.5 Hyperparameters .................................................................................................................. | F.2.5 |
| G Mathematical Background .................................................................................................................. | G |
| G.1 Hilbert Bundles .................................................................................................................. | G.1 |
| G.2 Connection Laplacian .................................................................................................................. | |
| G.3 Heat Flow on a Hilbert Bundle .................................................................................................................. | G.3 |
| G.4 Borel Functional Calculus .................................................................................................................. | G.4 |
| G.5 Cellular Sheaves and Sheaf Laplacians .................................................................................................................. | G.5 |
| G.6 Empirical Laplacians .................................................................................................................. | G.6 |
| H Proofs of Results .................................................................................................................. | H |
| H.1 Auxiliary Lemmas for Theorem 1 .................................................................................................................. | |
| H.2 Key Lemmas for Theorem 1 .................................................................................................................. | |
| H.3 Proof of Theorem 1 .................................................................................................................. | H.3.1 |
| H.4 Key Lemmas for Theorem 2 .................................................................................................................. | |
| H.5 Proof of Theorem 2 .................................................................................................................. | H.5 |
| H.6 Key Lemmas for Corollary 1 (Convergence in Architecture) .................................................................................................................. | |
| H.7 Proof of Corollary 1 (Convergence in Architecture) .................................................................................................................. | |
| H.8 Proof of Corollary 2 (Transferability) .................................................................................................................. |
Appendix A Extended Related Works
The connection between (possibly) continuous domains (manifolds and bundles) and discrete structure (graphs and cellular sheaves) first emerged in pioneering investigations on the so-called manifold hypothesis. This hypothesis posits that, although data may live in a high-dimensional ambient space, they are effectively generated by sampling from one or several low-dimensional (Riemannian) manifolds [22]. The manifold hypothesis underpins several modern spectral graph methods, i.e., nonlinear dimensionality-reduction/clustering/(deep) learning techniques that exploit latent geometric structures. The renowned work [14] from Belkin & Niyogi proved that, assuming access to a finite point cloud (the signals) sampled from the underlying manifold, it is possible to build a weighted undirected graph whose Laplacian converges to the Laplace-Beltrami operator of the underlying manifold in probability as the number of samples goes to infinity.
The work in [14] and related results, such as [92, 93], have been used (directly or indirectly) to design principled learning systems over manifolds. A consistent fraction of this literature focused on scalar manifold signals, thus the case in which one or more scalar values are attached to each point of a manifold. Notable examples are manifold convolutional neural networks [103, 67], kernel methods and Gaussian processes on manifolds [20, 74], as well as a growing literature of generative manifold models [111, 33]. In a complementary direction, operator-learning methods on manifolds extend neural operators beyond Euclidean domains; these methods handle an infinite-dimensional object globally, but they still assign a finite vector to each point of a manifold. Most of the works in this class are instances of neural manifold operators [26, 19, 106, 58], which aim at resolution-independent learning of PDE solution operators. Some works explored vector-valued manifold signals, i.e., multivariate real-valued functions supported on manifolds; in this case, one or more finite vectors are attached to each point of a manifold. Examples are tangent bundle convolutional neural networks [11] and vector-field Gaussian processes on manifolds built via gauge-independent projected kernels [56]. Moreover, especially in the statistics community, functional observations with manifold structure, i.e., manifold-valued functions supported on the real line, have been long studied [70, 31, 89], and recent works have started analyzing autoregressive processes on the sphere [24, 94]. Finally, learning systems acting on discrete bundles, i.e., bundles whose base space is a finite set/discrete manifold, have been recently investigated [5]. Despite their diversity, all the models cited in this paragraph use finite-dimensional fibers and implicitly assume the Levi-Civita connection. As such, they do not allow for the arbitrary connections or the potentially infinite-dimensional signals considered. One of the main reasons behind this gap is the lack of a rigorous generalization of the [14]’s convergence result in these settings, which is our main contribution.
Pioneering works on sheaf theory can be found in [65, 88, 48]. Cellular sheaves are combinatorial instances of sheaves that have been introduced in [90] and later rediscovered in [29]. In [90, 29], these sheaves were first defined over regular cell complexes, hence the term “cellular” sheaves. However, as in this work, cellular sheaves are often defined over tamer objects, here graphs. In [51, 35], the authors studied the problem of learning vector cellular sheaves, i.e., cellular sheaves over undirected graphs with finite-dimensional node signals. The works in [53, 52, 42, 83] introduced a novel class of diffusion dynamics on vector cellular sheaves. In [18, 50, 7, 39, 37, 78, 9, 11], neural networks operating on vector cellular sheaves over (undirected, directed, hyper) graphs with finite-dimensional signals are presented, generalizing graph neural networks. We again note however, that all these works implicitly or explicitly restrict to consider either the Levi-Civita or flat connections. Additionally, the work in [18] exploited vector cellular sheaf theory to show that the underlying geometry of the graph gives rise to oversmoothing behavior of GCNs. Also, (vector and general) cellular sheaves recently appeared in causal theory [30], control [49], and telecommunications [47]. Finally, the works in [9, 11] showed that neural networks for tangent bundle signals can be implemented as certain sheaf neural networks operating on vector cellular sheaves from manifold samples.
Appendix B Broader Impact, Future Directions and Limitations
The potential impact of this work extends well beyond the effectiveness of the HilbNet architecture. Our convergence result unifies and extends the graph- and vector-diffusion convergence theories of [14, 93], thereby enabling novel geometric learning systems for genuinely infinite-dimensional manifold-supported and equipped with arbitrary connections. HilbNets are just a first (principled, transferable) instance of such systems, but our result opens several new avenues.
Clustering and Dimensionality Reduction. Classical Laplacian-based methods for clustering [100] and nonlinear dimensionality reduction [13, 28] all rely, either explicitly or implicitly, on the convergence of the graph Laplacian to the Laplace–Beltrami operator. Our Theorem 1 provides the analogous foundation in the Hilbert bundle setting, immediately suggesting sheaf-spectral generalizations of these techniques. For instance, one may define Hilbert sheaf eigenmaps by computing the leading eigensections of and using them as coordinates, yielding embeddings that are aware not only of the base manifold geometry but also of the fiber-wise coupling encoded by the connection. This is particularly promising for data such as spatiotemporal fields or distributional signals, where standard spectral methods discard the internal structure of each observation. Similarly, sheaf-spectral clustering would partition data by jointly considering geometric proximity on and coherence of the infinite-dimensional signals across fibers, a strictly richer criterion than what scalar graph Laplacians can capture. The finite-rank convergence guarantee of Theorem 2 ensures that such methods can be implemented with truncated signals while remaining provably consistent with the underlying continuous geometry.
Structured Self-Supervised Learning. Self-supervised learning (SSL) has largely been built around objectives that encourage invariance or equivariance with respect to augmentations of the base domain [109, 96, 32, 110]. Our framework suggests a more structured family of SSL methods. Because the Hilbert sheaf Laplacian encodes both spatial geometry and fiber-wise transport, one can design contrastive or non-contrastive objectives that encourage learned representations to be sections of an appropriate bundle, i.e., to satisfy local consistency constraints dictated by the restriction maps. Such objectives would yield representations that are not merely invariant to domain augmentations but are geometrically coherent across fibers, a property that is especially desirable when downstream tasks depend on the relational structure between signals at different manifold points, as in multi-sensor forecasting or multi-agent coordination.
Generalizability Theory and Mechanistic Interpretability for Transformers. Another promising direction concerns the connection between our framework and transformer architectures [98]. In a standard transformer, each token is equipped with a positional encoding, either fixed (e.g., sinusoidal) or learned, that situates it in a continuous geometric space [95]. These positional encodings can be viewed as sampled points on a base manifold , with the encoding scheme implicitly defining the metric structure of the domain. The residual-stream representation at each position, or, in the infinite-width or infinite-context limit, the full distribution over possible activations, then lives in a Hilbert space fibered over this base point, so that the collection of representations across positions constitutes a section of a Hilbert bundle over . The attention mechanism then defines a data-dependent transport between these fibers. In this view, a self-attention layer is an instance of a single-step diffusion under a learned sheaf Laplacian whose base graph is determined by the sampled positional encodings. This may in some sense be viewed as a more precise incarnation of the recently introduced geodesic hypothesis [55], where the autoregressive output of transformers is modeled by a stochastic diffusion PDE in Euclidean space, rather than the proposed manifold-theoretic treatment. Making this correspondence precise would allow one to import the convergence and transferability machinery of Theorems 1–2 into the transformer setting. We note that there exists a recent line of work that attempts to adapt Laplacian-based GNN generalization and stability results to transformers [81], but due to their fundamental reliance on the convegrence result of [14], must work in a somewhat simplified setting. In conjunction with our convergence result, the generality of the Hilbert sheaf Laplacian is potentially well-suited for establishing extensions of these generalization theorems for a broader class of transformers. On the interpretability side, decomposing attention into a positional-affinity component and a fiber-transport component offers a principled lens through which to study what information each head moves and how it is transformed in transit. One could, for example, measure the holonomy of the learned connection around closed loops of attention to detect whether a head implements a nontrivial geometric transformation. While formalizing these connections requires treatment of the data-dependence of the connection and the interplay between positional and content-based attention, the mathematical infrastructure developed in this work provides a strong starting point.
Limitations. Our theoretical guarantees rest on some assumptions that may not hold exactly in practice, as is usually the case. The convergence results (Theorems 1–2) require the base manifold to be closed (compact without boundary), sections to be or smooth, and samples to be drawn i.i.d. from the uniform distribution. Real-world sensor networks, such as those in our traffic experiments, are neither uniformly sampled nor necessarily supported on compact manifolds, and measured signals are typically noisy rather than smooth. These gaps between theory and practice are absolutely standard in the Laplacian convergence literature: the foundational results of [14], as well as subsequent works on vector diffusion maps [92, 93] and manifold-based learning [28, 13], all assume compact manifolds and uniform or smooth sampling densities, yet are routinely and successfully applied under weaker conditions. Our numerical results confirm that HilbNets likewise remain effective under these standard approximations, consistently outperforming baselines that lack the principled bundle-geometric structure. On the computational side, the network sheaf Laplacian scales quadratically in the product of spatial and fiber dimensions, which may become prohibitive for very large graphs or high-dimensional signal discretizations without further sparsification or approximation strategies. Finally, broader and tailored empirical validation on other infinite-dimensional signal types, such as distributional or functional data on manifolds, remains an important direction for future work.
Appendix C Existing Convolutional Architectures as Actualizations of HilbNets
C.1 Universality of HilbNets
Due to the first-principles approach to the construction of HilbNets, we note that they serve as a sort of universal architecture. That is, several popular variations of convolutional architectures in geometric deep learning, across domains and modalities, can be derived as particular instantiations of HilbNets — even when their construction does not explicitly invoke cellular sheaves. We consider a few concrete examples of this philosophy.
C.1.1 CNNs, GNNs, and Sheaf NNs
Convolutional neural networks (CNNs) and graph neural networks (GNNs) are both often formalized as acting on signals . In particular, by the consistency of the discrete Fourier transform, CNNs operating on fixed grid can be viewed as operating on a principled discretization of . We may more generally view the uniform grid on which CNNs act as a particular instantiation of a graph, and thus view CNNs as a special case of GNNs [59], and the relevant convolutional operator as the graph Laplacian. The GNNs can likewise be viewed as principled discretizations of manifold neural networks [101], precisely by the convergence result of [14]. Sheaf neural networks [50, 18] may then be understood as an enrichment that allows for matrix-valued edge weights rather than scalars during convolution. Further, in [9], it was made precise that sheaf neural networks may in particular, be viewed as acting upon tangent bundle signals , via the convergence result of [92]. In particular, the convergence result of [92] applies strictly to tangent bundle setting, providing an explanation as to why existing works of sheaf neural networks either explicitly or implicitly restrict to discretizing the tangent bundle with either flat or Levi-Civita connections [7, 5]. As such, existing sheaf neural network architectures can be typically be recovered as HilbNets under the paradigm that .
C.1.2 Equivariant CNNs and GNNs
The equivariant case of CNNs and GNNs arises when we wish for our architecture to respect some underlying symmetry group . More generally, there may not exist a global representation of the symmetry, but rather only a local representation. In physics, this is known as a gauge symmetry, and is formalized by considering our signal as a section of a bundle with connection , where the group action is then encoded as a symmetry of the connection . As such, gauge-equivariant CNNs and GNNs constitute perhaps the most general equivariant architectures in the literature (see Weiler et al. [105] for a thorough introduction in the CNN case), and are formulated precisely as convolutions at the level of sections of a frame bundle (although the necessary datum of a connection is often suppressed in the literature). From this perspective, it is natural that gauge-equivariant CNNs and GNNs can be derived as particular cases of cellular sheaf networks (see the Li et al. [68] for a more in-depth exploration of this perspective). Thus, by noting that these architectures can be equivalently reformulated in the language of sheaves, our Theorem 1, as well as the consequent transferability result, can be seen to apply to these architectures. In particular, this may be understood as establishing the theoretical bedrock for the intuitive idea that as the underlying mesh or graph is increasingly refined, these architectures indeed approach continuous operators on sections of the underlying bundle, while maintaining equivariance across scales.
C.1.3 Spatio-Temporal GNNs
A formal treatment of signal processing of graphs whose signals at each node are timeseries is still emerging and is an active area of research, and these filtering techniques then serve as the basis for the development of spatiotemporal graph neural networks (STGNNs). In this literature, it is common to consider convolutional operators built via the joint Laplacian , where is the graph-domain Laplacian and is the ‘time-domain’ Laplacian [45]. Due to this decomposition, the resulting ‘time’ and ‘space’ filters commute, allowing for the development of both time-and-space or time-then-space STGNNs [73]. Consider now the continuous setting. Convolutions via intertwine the spatial and temporal domains, and this is precisely encoded by our parallel transport maps. So suppose our bundle is trivial with trivial connection . In this case, our parallel transport maps are simply , and the connection Laplacian collapses to the Laplace-Beltrami operator. On product manifolds , the Laplace-Beltrami operator decomposes as , implying that heat flow is given by (see [46] for a formal derivation). Expressing as an integral operator via the heat kernel, the fact that spatial and temporal filters commute in this case is then simply an application of Fubini’s theorem. As such, we see that the type of filtering commonly considered in STGNNs is recovered precisely as the ‘base case’ of HilbNet, and in particular, our robustness guarantees can also be applied to these STGNN architectures.
Appendix D Practical Implementations of Parallel Transport
As we have established, a key strength of the HilbNets architecture is the ability to encode signal-level geometric priors through the principled incorporation of relevant parallel transport operators. This naturally raises the question as to how these transport operators should be implemented in-practice. We may consider three general classes of use-cases.
Task-Inherent Priors The most theoretically well-grounded case is when knowledge of the geometry of task itself may be utilized to build our parallel transport operators. For instance, suppose the nodes of our base graph represent cameras and the task is multi-view 3D recognition. Then the relevant transport operators should record the rotation that aligns views as in [68], resulting in an appropriately equivariant sheaf Laplacian operator. More generally, whenever the data modality lacks a ‘global’ reference frame or coordinate system, then the appropriate alignments between local reference frames precisely gives rise to a connection and the associated parallel transport. For instance, biomedical timeseries analysis often utilizes algorithms based upon the large deformation diffeomorphic metric mapping (LDDMM) [12], a core part of which may be understood as extracting the necessary parallel transport from the data using the first-order ODE definition. We may also consider the tangent bundle networks of [11] in this category, as they use explicit vector-field data from which they may then compute the necessary sheaf transition maps via local PCA. As such, we see that HilbNets may be applied to any of these settings, where the relevant parallel transport would be completely determined by the task itself and thus, can typically be explicitly pre-computed.
Domain-Inherent Priors Alternatively, it is often the case that we may not have access to task-specific priors, but rather to general knowledge of the structure of the signal-domain. For instance, in many domains, our generic stalks may be equipped with the additional structure of a reproducing kernel Hilbert space (RKHS), i.e. . Analogously to the previous case, we may then view parallel transport as operators that that maximize alignment, but now with respect to our kernel. For instance, given a choice of similarity kernel between timeseries or distributions, then given our initial section data , we may define our parallel transport operator matrices via
| (15) |
for some suitable class of operators , and force the diagonal blocks of the sheaf Laplacian to be the sum of the scalar edge weights given by the kernel. This is exactly the discretized Sheaf Laplacian from (8). In practice, given a choice of similarity kernel on our fibers, we may then either precompute these parallel transport operators using the above optimization objective or learn them end-to-end with the model’s learned filters. In the latter case, (15) is applied as a regularization to the task loss. The special case in which (15) is not employed at all, , in (10), recovers the sheaf diffusion neural network from [18]. As such, we see that the greater generality of the Hilbert sheaf Laplacian consequently lends itself to more flexible and perhaps more broadly applicable design choices than existing sheaf neural networks.
Appendix E Parallel Transport Parametrizations
This appendix details the finite-dimensional transport parametrizations used to instantiate the network sheaf Laplacian in the experiments. In particular, this particular instantiation of HilbNets may be considered as a paticular of the end-to-end learning paradigm introduced in D for polynomial filters and a few demonstrative classes of and . The discussion should be read as a continuation of the two-stage discretization in Section 5: after sampling the manifold, we obtain a Hilbert cellular sheaf ; after sampling or projecting the fibers, we obtain a finite-dimensional network sheaf with -dimensional stalks.
E.1 From bundle transports to network-sheaf restrictions
Recall that, before signal discretization, the Hilbert cellular sheaf induced by a sample assigns the node stalk
| (16) |
and, for an edge , the edge stalk
| (17) |
where is the midpoint of the chosen geodesic between and . Its restriction maps are weighted parallel transports of the form
| (18) |
After fiber discretization, the network sheaf has finite-dimensional stalks, which we identify with after choosing the first basis elements of the fiber Hilbert space. The corresponding restriction maps are matrices
| (19) |
For the pragmatic parametrizations used in the experiments, it is useful to express the same sheaf Laplacian in node-to-node transport coordinates. Fix an orientation convention for each edge . After identifying the edge stalk with the coordinate system of one endpoint, we write the restrictions as
| (20) |
where
| (21) |
is the finite-dimensional transport carrying the discretized fiber over into the discretized fiber over along edge . When the transport comes from the continuous Hilbert bundle, this matrix is the finite-dimensional representation of the corresponding parallel transport, after the chosen fiber projection and coordinate identification. When the continuous connection is unknown, is instead chosen from a transport hypothesis class.
With the shorthand
| (22) |
the action of the network sheaf Laplacian on a sampled signal
| (23) |
takes the concrete form
| (24) |
Equivalently, is the block matrix with blocks
| (25) |
This is the same sheaf Laplacian defined in Section 4, specialized to the coordinate convention in (20). The scalar weight controls how strongly the two sampled base points interact, while the transport matrix controls how the two discretized fibers are aligned before their signals are compared.
E.2 Transport hypothesis classes
As mentioned in D, the true connection is often unknown and parallel transport maps must be learned. In the experimental results of this work, we always learn the parallel transport maps end-to-end using the task loss regularized with (15), and we restrict each edgewise transport to hypothesis classes such that
| (26) |
We use three transport classes in the experiments: frozen identity, free orthogonal transports, and circulant or time-stationary transports.
Frozen identity.
The simplest class is
| (27) |
This recovers the usual assumption that neighboring fibers are canonically identified and that no non-trivial alignment is needed. In this case,
| (28) |
for every edge, and (24) becomes
| (29) |
Thus, frozen identity reduces the sheaf Laplacian to a standard weighted graph Laplacian applied independently to each fiber coordinate, and, therefore, HilbNets to standard GCNs. It is a useful baseline: any improvement over frozen identity quantifies the value of learning or imposing non-trivial transports.
Free orthogonal transports.
The most expressive finite-dimensional class is
| (30) |
or, when the target transports are known to lie in the identity component,
| (31) |
In the experiments, we parameterize free orthogonal transports by products of Householder reflections.
For a nonzero vector , define the Householder reflection
| (32) |
Each is orthogonal and symmetric:
| (33) |
For each oriented edge , we store Householder vectors
| (34) |
and define
| (35) |
Therefore is exactly orthogonal for every parameter value.
By the Cartan–Dieudonné theorem, every matrix in can be represented as a product of at most Householder reflections. In practice, the choice of is dataset-dependent: in our synthetic experiments we use for , a modest over-parametrization that aids optimization in our traffic experiments we use for , which parameterizes a strict Householder subset of rather than the full orthogonal group, and which we find sufficient for the alignment patterns observed in the data. If one fixes exactly non-degenerate reflections, the determinant parity is fixed:
| (36) |
Thus, an even number of reflections parameterizes the identity component , while an odd number parameterizes the other component. In the synthetic Gaussian experiment, the ground-truth Levi-Civita transports are obtained continuously from the identity along geodesics and, after Cholesky rescaling, lie in the identity component. Hence an even number of reflections is appropriate. If both connected components of are needed, one may add a fixed final reflection or a discrete sign component. For numerical stability, the implementation uses
| (37) |
with a small to avoid division by zero at degenerate . This recovers the exact Householder reflection with in the limit for , and yields a matrix orthogonal up to numerical precision.
The free class is useful as an expressivity test. If the target transport belongs to in the chosen coordinates, then the Householder class can represent it. This is precisely the role it plays in the synthetic statistical-bundle experiment, where Cholesky rescaling converts the intrinsic Wasserstein-unitary Levi-Civita transports into Euclidean-orthogonal matrices. In real-data experiments, the free class serves as a high-capacity transport baseline.
Circulant or time-stationary transports.
For time-series fibers, the discretized fiber dimension is the number of retained time samples, so we write . A natural prior is that inter-fiber transport should commute with time shifts. Let
| (38) |
be the cyclic shift operator. A time-stationary transport is one satisfying
| (39) |
The commutant of the cyclic shift is the algebra of circulant matrices. Requiring in addition that be orthogonal gives the class of orthogonal circulant transports.
Let denote the unitary discrete Fourier transform matrix. Then every orthogonal circulant transport has the form
| (40) |
For real-valued time-domain signals, the Fourier multipliers must satisfy conjugate symmetry:
| (41) |
We therefore store only the independent positive-frequency phases. Let
| (42) |
The learnable parameter for edge is
| (43) |
We define
| (44) |
If is even, the Nyquist frequency is self-conjugate and is fixed to
| (45) |
for the identity-component parametrization. This yields a real orthogonal circulant matrix through (40). Equivalently, can be constructed in real arithmetic from its first column. For , define
| (46) |
The full circulant matrix is then
| (47) |
This form is convenient for implementation because it avoids explicitly manipulating complex-valued matrices. The circulant class has only
| (48) |
parameters per edge, compared with degrees of freedom for a general orthogonal matrix. Each phase has a direct interpretation as the phase lag at frequency between the two endpoint fibers. Thus, the transport may advance or delay oscillatory components across an edge, but it cannot arbitrarily mix frequencies or reshape the waveform. This is the intended inductive bias for spatiotemporal signals such as traffic or sensor time series, where neighboring sensors may observe delayed or phase-shifted versions of related temporal patterns. In the synthetic experiment, the same class is used more abstractly as a structured subgroup of against which the ground-truth transports can be projected.
E.3 Parameter counts
For a graph with undirected edges and discretized fiber dimension , the transport parameter counts are summarized in Table 3.
| Transport class | Parameters per edge | Interpretation |
|---|---|---|
| Frozen identity | no learned alignment | |
| Free Householder | product of reflections in | |
| Full circulant | one phase per positive frequency |
Thus, the free class is maximally expressive but parameter-heavy, while the circulant classes encode a strong time-stationary prior and scale linearly with the number of frequencies or bands.
Appendix F Additional Experimental Details
F.1 Synthetic experiments: the statistical bundle over centered Gaussians
F.1.1 The bundle
The base manifold is , the open cone of symmetric positive-definite matrices. Each parameterizes a centered Gaussian on with density . We equip with the Otto-Wasserstein metric, namely the Riemannian metric induced on by the optimal-transport distance between centered Gaussian measures.
Concretely, the tangent space is naturally identified with symmetric matrices,
| (49) |
and the Otto-Wasserstein inner product between is
| (50) |
where is the unique symmetric solution of the Lyapunov equation.
Above each , the ambient Hilbert fiber is the vector-field space
| (51) |
equipped with the inner product
| (52) |
This fiber is genuinely infinite-dimensional. The finite-rank fiber used in the synthetic experiments is the Otto-velocity image of covariance perturbations:
| (53) |
Thus, the computational fiber is a finite-dimensional statistical subspace of the ambient Hilbert fiber. The map is an isometry between with the Otto-Wasserstein metric and with the inner product. Indeed, for ,
| (54) |
Using and the fact that , , and are symmetric, this equals
| (55) |
Therefore,
| (56) |
Since the fibers are already -dimensional, the fiber discretization of Proposition 1 is exact with . Throughout this appendix, we therefore write and use the network sheaf notation and Laplacian from Section 5.
This construction is useful because it gives a faithful but tractable proxy for the Hilbert-bundle settings that motivate HilbNets. It is faithful in the sense that the ambient fibers are infinite-dimensional vector-field Hilbert spaces, and the Levi-Civita connection of yields non-trivial, metric-compatible parallel transports. It is tractable because the Otto-velocity map selects a finite-rank statistical sub-bundle on which the metric, parallel-transport ODE, and projection of ground-truth transports onto restricted transport classes admit closed-form numerical evaluation. Thus, the experiments probe the finite-rank computational slice used by the implementation.
F.1.2 Levi-Civita connection and closed-form parallel transport
The Levi-Civita connection on is the canonical metric-compatible torsion-free connection associated with the Otto-Wasserstein metric. In covariance coordinates, its Christoffel symbol is
| (57) |
which is symmetric in , as required for a torsion-free connection.
Let denote the Wasserstein geodesic from to . Parallel transport of a tangent vector along is governed by
| (58) |
We solve this ODE numerically by Euler integration with steps. The resulting linear map on the finite-rank fiber is the ground-truth Levi-Civita transport
| (59) |
In the notation of Def, 3, the restriction maps of the induced sheaf use the midpoint transports , which play the role of the discretized parallel transport with . These midpoint transports define the restriction maps used in the spectral-stability experiments. The transport-recovery experiments instead regress against the Cholesky-rescaled node-to-node transport , which adopts the orientation convention of Appendix D.
Because the connection is metric-compatible, is unitary with respect to the Otto-Wasserstein inner products on the source and target fibers:
| (60) |
We verify this numerically to within using Euler steps. This Wasserstein-unitarity is the geometric invariant that justifies comparing the ground-truth transports to orthogonal parametrizations after metric rescaling.
F.1.3 Cholesky rescaling
The free- transport class used in the implementation is Euclidean-orthogonal in vectorized fiber coordinates (cf. the hypothesis classes in Appendix D with ). However, the intrinsic fiber metric is , not the raw Frobenius metric on . Therefore, is not generally orthogonal in raw coordinates.
Let be the Gram matrix of in a fixed basis of . We factor
| (61) |
by Cholesky decomposition and represent a fiber coordinate vector in the rescaled frame as
| (62) |
In this frame, the rescaled Levi-Civita transport is
| (63) |
By Wasserstein-unitarity, it satisfies
| (64) |
Hence,
| (65) |
This is the coordinate system used in the transport-recovery experiments. In these coordinates, the free- Householder class described in Appendix D contains the ground-truth transports. The spectral-stability metrics are computed from the assembled sheaf Laplacian and are invariant to this coordinate choice up to similarity transformation.
F.1.4 Sample construction
We draw samples
| (66) |
where is a Haar-random orthogonal matrix, obtained by QR decomposition of a standard-normal matrix, and is diagonal with log-uniform spectrum on . Equivalently, the eigenvalues of lie in on a log-uniform scale.
Although is non-compact, this procedure samples a bounded subset of it. This is appropriate for the finite deployment-regime stability tests reported here, but it is not the normalized-volume sampling assumption used in the asymptotic convergence theorem.
We build a NN graph with under the Gaussian Wasserstein distance and assign Gaussian-kernel weights
| (67) |
The induced network sheaf has per-edge restriction maps
| (68) |
as in Def, 3, where is the Wasserstein-geodesic midpoint between and .
F.1.5 Spectral stability under sampling density increase
For each Gaussian dimension , sample size , and random seed, we sample , build using the closed-form Levi-Civita transports, and assemble the sheaf Laplacian
| (69) |
as a sparse block matrix. Since the sheaf Laplacian has size and we require a dense eigendecomposition, we choose so that remains tractable on a single CPU node, yielding for (i.e., ), respectively.
Let
| (70) |
denote the bottom- eigenvalues of , with , and define analogously for the reference operator . We measure both the aggregate and worst-case relative spectral discrepancy of the bottom- eigenvalues of against a high-resolution reference, sweeping and , and averaging over 5 sampling realizations. Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.
Moreover,
Aggregate discrepancy.
Fig. 3 (Left) reports the low-frequency spectral discrepancy
| (71) |
We average across three seeds and report as the shaded band.
Worst-case discrepancy.
Fig. 3 (Right) reports the relative max error over the bottom- eigenvalues:
| (72) |
This complements the aggregate metric by capturing worst-case low-frequency spectral error.
Overall, Fig. 3 shows a monotone decrease for both metrics across all dimensions, demonstrating that the sheaf Laplacian stabilizes as manifold sampling density increases, and faster for higher signal sampling densities.
We train three transport parametrizations from Appendix D, free (Householder), circulant, and frozen identity, to recover the Levi-Civita transports in Cholesky-rescaled coordinates by minimizing the per-edge transport-MSE loss
| (73) |
Here is the rescaled transport produced by the model for edge , is the ground-truth Levi-Civita transport from to in Cholesky-rescaled coordinates, and is a fresh isotropic Gaussian test vector.
By the trace identity
| (74) |
the population loss equals the mean per-edge squared Frobenius distance
| (75) |
Free .
The free transport is parameterized by products of Householder reflections, as detailed in Appendix D. Since in the Cholesky-rescaled frame, the free- class contains the target transport and the minimum population loss is zero. Empirically, we observe at convergence, confirming recovery to numerical precision.
Restricted classes and analytical projections.
For each restricted transport hypothesis class , the population loss minimum is the mean Frobenius distance from each ground-truth transport to its best approximation inside :
| (76) |
This is the Theory column in Table 1.
Frozen identity. For , the projection is trivial: , so
| (77) |
Circulant. For the circulant class (Appendix D with ), let be the DFT matrix and define
| (78) |
The Frobenius-best projection of any onto this class is the diagonal-phase Procrustes solution
| (79) |
The zero-frequency phase is pinned to zero to preserve the constant mode, and the self-conjugate Nyquist frequency, when present, is handled according to the real-valued convention of Appendix D. In the synthetic experiment the circulant class is used as a structured subgroup of for testing transport-class projection, rather than as a time-series prior.
Empirical vs. theoretical plateaus.
In Table 1, the Empirical column is the lowest training loss observed over the -epoch budget, while the Theory column is computed on the same edge set used during training. Both columns report means standard deviations across three seeds. The empirical and theoretical plateaus track each other to within for the circulant class and within for frozen identity, with closer agreement (within ) at large . This confirms that restricting the transport class to constrains the per-edge restriction maps of in a quantitatively predictable way. We do not claim that such an arbitrary edgewise subgroup constraint automatically lifts to a smooth global connection class on the continuous bundle.
F.1.6 Hyperparameters
See Table 4.
| Spectral stability | Transport recovery | |
| Gaussian dimension | () | () |
| sample-size grid | ||
| reference | at | — |
| graph | kNN, , distance | kNN, , distance |
| top- eigenvalues | — | |
| seeds | ||
| epochs / patience | — | / |
| optimizer | — | Adam, lr , batch |
| Householder reflections | — | |
| Euler steps for |
F.2 Traffic forecasting: experimental details
F.2.1 Datasets
We use two standard traffic-speed benchmarks from [69].
METR-LA. loop-detector sensors on the Los Angeles highway network, recording average traffic speed at -minute intervals. The spatial graph follows the DCRNN convention: edge weights with set to the standard deviation of the pairwise road-network distances are thresholded to retain (with ) and symmetrized via .
PEMS-BAY. sensors in the San Francisco Bay Area, with the same temporal resolution and graph construction.
For both datasets, we use observed time steps as input and forecast at horizons . Train/validation/test splits follow the standard chronological partition of [69].
F.2.2 Task formulation
At each forecasting instance, the input signal is
| (80) |
where and , so the network sheaf has -dimensional stalks and Laplacian . The goal is to predict future speed vectors at each horizon . The prediction loss is the mean absolute forecasting error
| (81) |
For HilbNet variants with learned transports, we add the kernel regularizer of Appendix D with weight , giving the full training objective .
F.2.3 Model variants and baselines
All HilbNet variants are discretized HilbNets (Def, 5) with polynomial filters of order and the same per-node linear readout (DCRNN convention) mapping sheaf-filtered features to per-node horizon predictions. The only architectural difference is the admissible class of edgewise transports , as described in Appendix D with . The transport hypothesis classes are the same of the previous section, briefly summarized and contextualized below.
Frozen identity. . Neighboring sensors exchange time windows without temporal alignment. The sheaf Laplacian reduces to a standard weighted graph Laplacian applied independently to each temporal coordinate, recovering a graph convolutional network.
Circulant. Each transport is a real orthogonal circulant matrix parameterized by frequency-wise phases per edge. This encodes a time-stationary prior: the transport can advance or delay oscillatory components across an edge but cannot arbitrarily mix temporal coordinates. This is a natural inductive bias for traffic data, where congestion patterns propagate through the road network with local delays and phase shifts.
Free . Each transport is parameterized by a product of Householder reflections, yielding an orthogonal matrix in a strict Householder-defined subset of (see Appendix D). This is the most expressive transport class but uses more parameters in total than the circulant variant (e.g., vs on METR-LA; per edge the ratio is , since circulant uses only phases per edge at ).
We also compare against two non-transport baselines. A fiber-only MLP processes each sensor’s time window independently, ignoring graph structure. A spatiotemporal graph baseline applies standard graph convolution to the temporally-augmented node features but does not learn sheaf transports. Finally, we include two external baselines from the literature: FC-LSTM [69] and STAEformer [71], reported as in the cited papers.
F.2.4 Detailed results
Table 2 reports MAE, RMSE, and MAPE at horizons , , and (mean std over five seeds for our experiments).
Value of graph structure.
On both datasets, the frozen-identity HilbNet improves over the fiber-only MLP baseline, confirming that sheaf diffusion over the spatial graph is beneficial even without non-trivial transports.
Value of learned transports.
Both the free and circulant variants consistently outperform frozen identity at all horizons on both datasets. This confirms that the sheaf structure helpw beyond the usual graph structure.
Free vs. circulant.
On METR-LA, the free- model achieves the best absolute accuracy at all horizons (e.g., MAE vs. for circulant at horizon ), as expected from its larger hypothesis class. On PEMS-BAY, the two variants are nearly tied: free wins MAE and MAPE at horizon by mph (a – effect over five seeds), circulant wins MAE/MAPE at horizon , and RMSE is statistically indistinguishable at horizons and . In both cases, the circulant model uses roughly one tenth of the transport parameters (e.g., vs. on METR-LA), making it the most parameter-efficient HilbNet variant. This supports the central pragmatic-transport message: a structured transport class encoding a physically motivated alignment prior can recover most of the benefit of unconstrained learned transports with substantially fewer degrees of freedom.
Comparison with external baselines.
Our HilbNet variants are lightweight models designed to test the value of Hilbert-sheaf structure, not to compete with large-scale spatiotemporal transformers. The external baselines (FC-LSTM, STAEformer) are included for reference and use substantially more parameters and architectural components. Nevertheless, the circulant and free HilbNets outperform FC-LSTM at all horizons on both datasets while using far fewer parameters.
F.2.5 Hyperparameters
See Table 5. All experiments are run on a single H200 GPU. Hyperparameters are chosen with a sweep. All presented variants are computed using the same codebase, and we made sure they differ only in their transport parametrization.
| METR-LA | PEMS-BAY | |
|---|---|---|
| sensors | ||
| input window | ||
| horizons | ||
| input feature dim | (speed + time-of-day) | (speed + time-of-day) |
| graph | thresh. kernel , | thresh. kernel , |
| sheaf-conv layers | , channel widths | , channel widths |
| polynomial order per layer | (HilbNet), (MLP fiber baseline) | (HilbNet), (MLP fiber baseline) |
| Householder reflections | ||
| readout | per-node linear () | per-node linear () |
| epochs / patience | / | / |
| batch size | ||
| optimizer | Adam, | Adam, |
| learning rate | (HilbNet); (STGNN baseline) | (HilbNet); (STGNN baseline) |
| LR schedule | cosine annealing, | cosine annealing, |
| weight decay / dropout | / | / |
| gradient clipping | (DCRNN convention) | (DCRNN convention) |
| kernel regularizer | ||
| seeds |
Appendix G Mathematical Background
G.1 Hilbert Bundles
In this section, we provide relevant background on the theory of Hilbert bundles. In particular, we define the notions of Banach and Hilbert manifolds, as well as introduce the appropriate notions of connection, parallel transport, and heat flow for bundles in this setting.
G.1.1 Banach and Hilbert manifolds
To study heat kernels for smooth Hilbert bundles, we must examine manifolds modeled on generic Banach spaces. We will assume all Banach spaces and Hilbert spaces are defined over the field of real numbers , unless otherwise stated.
Definition 7.
A second-countable topological space is a topological Banach manifold if there is a Banach space and an atlas such that the following conditions hold:
-
1.
each is an open subset of ;
-
2.
each is a homeomorphism onto an open subset of ;
-
3.
for all , is an open subset of ;
-
4.
the transition map is a homeomorphism.
When the Banach space is specified, we say that is an -manifold, or a Banach manifold modeled on . If each map and transition map is -times Fréchet differentiable, we say that is a -Banach manifold. If these maps are smooth i.e. , we say that is a smooth Banach manifold.
Definition 8.
A topological (resp. / smooth) Banach manifold is a topological (resp. / smooth) Hilbert manifold if it can be modeled on a Banach space which admits the structure of a Hilbert space.
Remark 1.
We make a few observations about this definition.
-
1.
Since every -dimensional real Banach space is isomorphic to , a finite dimensional Banach manifold is exactly a real manifold in the usual sense.
-
2.
Like ordinary manifolds, we require Banach manifolds to be second countable, and hence to have a countable dense subset. It follows that if is a manifold modeled on a Banach space , then itself must be separable. This condition could be be removed, but it will generally make our lives easier.
-
3.
The definition of a Hilbert manifold does not directly require the transition maps to respect inner product structure on the modeling Hilbert space . Hence, it is often better to think of a Hilbert manifold as a special case of a Banach manifold, instead of as a manifold that respects the Hilbert space structure per se.
The usual differential geometric constructions on manifolds extend naturally to Hilbert and Banach manifolds. For example, tangent spaces generalize naturally. Given a -Banach manifold with , for each , one may form the tangent space at a point as equivalence classes of triples of a chart and a vector , under the relation:
Such equivalence classes are easily seen to form a real vector space isomorphic to .
G.1.2 Smooth bundles
Definition 9 (Smooth Banach and Hilbert bundles).
Let be a smooth finite–dimensional manifold and let be a fixed separable Banach space. A smooth Banach bundle with model space consists of a smooth Banach manifold equipped with a smooth surjective submersion
that satisfies the following conditions.
-
1.
Local triviality. For every there exists an open neighborhood and a diffeomorphism
satisfying , where is the canonical projection, and such that, for each , the restriction is a bounded linear isomorphism. We call the pair a trivializing chart.
-
2.
Smooth transition functions. Whenever and are trivializing charts, the transition map
is a bounded isomorphism and depends smoothly on ; that is, is a smooth map, where denotes the Banach–Lie group of bounded invertible operators on with the operator‐norm topology.
-
3.
Smooth norm. There is a smooth map such that the trivializing charts can be chosen with the additional property that for each ,
-
4.
Smooth fiberwise operations. The fiberwise addition and scalar‐multiplication maps
are smooth Banach‐manifold maps.
When the Banach space is a separable Hilbert space, we say is a Hilbert bundle. We denote the inner product on the fiber by .
Remark 2.
We make a few remarks about the definition of a Hilbert bundle above.
-
1.
For convenience, we restrict our attention to Hilbert bundles over closed finite-dimensional manifolds with separable fibers. None of these restrictions are essential for the general theory of Banach and Hilbert bundles. However, these restrictions are necessary for our approach to constructing heat kernels in this setting.
-
2.
The intuitive idea is the following: a smooth Banach bundle is a smooth vector bundle where the fibers are allowed to be infinite-dimensional and come equipped with a complete norm. The smooth norm condition enforces that the Banach space fibers are stitched together in such a way that the fiber-wise norm varies smoothly. In the case of a Hilbert bundle, the smooth norm condition also enforces that the fiber-wise inner product varies smoothly.
-
3.
In light of the previous remarks, the definition presented here is not minimal. We make the choice to include redundant information in our definition for clarity, with the understanding that some conditions are superfluous [75].We also make the choice to include the smooth norm condition, often called a smooth orthogonal/Hermitian metric, in the definition of the bundle itself.
-
4.
In the case of a finite dimensional model space , this definition recovers the usual smooth vector bundle, with the additional data of a smooth orthogonal/Hermitian metric.
-
5.
Suppose is a smooth Hilbert bundle modeled on a Hilbert space . While the transition maps must respect the topological structure of the Hilbert space , it need not respect the inner product structure. When each transition map is a unitary isomorphism, we say the bundle is a smooth unitary Hilbert bundle.
Definition 10 (Smooth sections of a Banach bundle).
Let be a smooth Banach bundle (resp. Hilbert bundle) over a finite–dimensional manifold with model Banach space (resp. Hilbert space) . A section of is a map such that . We denote the collection of all smooth sections by . Note that this is a module over the commutative algebra with point-wise addition and multiplication. If the section is only -times continuously differentiable, we write .
Definition 11 (-Sections of a Banach bundle).
Suppose that the manifold is endowed with a measure . We say that is an -section if . We may similarly form a space of -sections, denoted , or simply when the measure is implied by context. When is modeled on a separable Hilbert space , the space of -sections is a real separable Hilbert space with inner product .
Remark 3.
In the Riemannian setting, we have a natural candidate for via the (pseudo) volume form on , or its normalized variant.
G.1.3 Connections
We now introduce connections on smooth Banach and Hilbert bundles. The smooth Banach manifold structure on the bundle provides no way to directly compare vectors in different fibers and with respect to their Banach space structures. Instead, as in the finite-dimensional case, we use a connection to link the fibers though the geometry of the base manifold .
Definition 12.
Let be a smooth Banach bundle over a compact manifold , and let denote the cotangent bundle on . A connection on is any of the following three equivalent structures:
-
1.
A connection is an -linear map:
such that the product rule:
holds for all smooth function and smooth sections .
-
2.
A connection is a map which is -linear in its vector-field input, and satisfies the Leibniz rule:
where and is the directional derivative of along the vector field .
-
3.
A connection is the data of an -linear map for each and that satisfies the following conditions:
-
(a)
depends smoothly on
-
(b)
for all , , and ;
-
(c)
for every smooth and section , let denote the section . For each , the maps and are related by
for all .
-
(a)
The following proposition provides a standard representation theorem for a smooth connection .
Proposition 2.
Let be a smooth connection on a trivial Hilbert bundle . There is a map
such that for every section , we have:
where is the Fréchet derivative. Equivalently, for each , and , there is a bounded linear operator , varying smoothly in , such that:
Moreover, the assignment is linear for each .
Remark 4.
While this proposition is stated for trivial bundles, it can be applied to any Hilbert bundle through the choice of a trivialization, or through Kuiper’s theorem.
G.1.4 Parallel Transport
Connections allow us to relate the geometries of the fibers over nearby points in . For example, we may define parallel transport.
Given a smooth curve , say that is a section over if .
Definition 13.
Let be a smooth Banach bundle with model space , let be a connection on , and let be a smooth path. A map is a section over if . A section over is parallel if
Proposition 3.
Let be a smooth path in . For every vector , there is a unique parallel section over such that . Moreover, the dependence on is smooth and linear.
Proof.
The existence of the parallel section can be restated as an initial value problem for a linear ordinary differential equation in a Banach space. Standard existence and uniqueness theorems apply. For details, see [62]. ∎
By the existence and uniqueness of parallel sections, we may define corresponding parallel transport maps. Given a path , there is an induced parallel transport operator defined by
It is straightforward to see that is a linear bijection, with inverse given by , where is the path obtained by reversing . By the closed graph theorem, it follows that is a bounded linear isomorphism.
Definition 14.
Let be a Hilbert bundle equipped with a connection . We say the connection is compatible with the Hilbert bundle structure if it satisfies:
for every smooth vector field , and sections .
Proposition 4.
Let be a Hilbert bundle with connection . The following are equivalent:
-
1.
is compatible with the Hilbert bundle structure;
-
2.
Every parallel transport map is unitary;
Proof.
First suppose that is compatible with the Hilbert bundle structure, and a smooth path in . Let . By compatibility, we may check that:
It immediately follows that is unitary.
Conversely, suppose that every parallel transport map is unitary. Let be a smooth vector field, and smooth sections of . Let , and let be a smooth path such that . For let be a parallel section over such that . For , let . Finally, let . Since is parallel over , we have that Moreover, . We may use these facts to compute:
Since parallel transport maps are unitary, it follows that the quantity is constant in . Hence
proving that is a compatible connection. ∎
Remark 5.
We note that by the Proposition 4, as we assumed our parallel transport operators to be unitary in the main text, this may equivalently be understood as a metric-compatibility condition.
G.2 Connection Laplacian
Let be a Hilbert bundle on a closed Riemannian manifold, equipped with a compatible connection . Moreover, inherits the structure of a Hilbert bundle, with fiber-wise inner products induced from the metric and the fiber-wise inner products of . Since is a linear differential operator, it has a formal adjoint , defined implicitly by the formula:
where , , and is the pseudo-volume form on . Using this adjoint, we may define the connection Laplacian.
Definition 15.
The connection Laplacian is the linear operator:
Remark 6.
The connection can be extended to a closed, densely-defined unbounded operator . This extended operator has an adjoint as a Hilbert space operator, from which we may define a composite . The formal adjoint and connection Laplacian can be found by restricting the domains of and to linear subspaces of smooth sections. From this perspective, it becomes clear that the connection Laplacian is well defined for all sections, as is contained in the Sobolev space .
The connection Laplacian also admits a characterization in terms of covariant derivatives. Let denote the second covariant derivative with respect to vector fields .
Lemma 1.
(Connection Laplacian in Coordinates) Let be a Hilbert bundle over a closed Riemannian manifold equipped with a compatible Fréchet connection. As operators,
Moreover, with respect to a local orthonormal frame that is synchronous at , we have equality:
Proof.
We adapt the proof of [79] to the Hilbert bundle setting, using the synchronous frame technique of [76]. By a partition of unity argument, it suffices to show that for every pair of smooth sections supported inside the domain of a local orthonormal frame , we have an equality:
Using the formal adjoint of , the integral on the left can be rewritten as
To analyze the right hand side, simply note that , and by compatibility, that:
Rearranging and summing over yields:
The second sum on the right-hand side may be identified as the divergence of the vector field , and hence integrates to zero by Stokes’ theorem. Therefore
as well. Therefore . Under the additional hypothesis that is synchronous at , we have for each . At such a point, the trace reduces to . ∎
G.3 Heat Flow on a Hilbert Bundle
In this section, we fix a closed finite-dimensional Riemannian manifold with canonical volume pseudo-form , a smooth Hilbert bundle with fiber , and a connection . Our goal in this section is three-fold:
-
1.
Demonstrate that the heat equation with respect to the connection Laplacian has a unique solution;
-
2.
Show that the heat-flow admits a Heat Kernel;
-
3.
Provide asymptotic estimates for the heat flow that relate to the geometry of the underlying manifold .
Our approach is to adapt the methods of Berline, Getzler, and Vergne [15] for finite-rank bundles to the Hilbert bundle setting. A key subtlety that arises in this generalization is in the definition of tensor-products of bundles. While the algebraic and topological tensor products of finite-rank Hilbert spaces agree, they need not coincide for general Hilbert spaces. This complicates, for instance, the necessary tensor-hom adjunction. In order to keep track of the appropriate tensor, we adopt the following convention. Let and be smooth Hilbert bundles over a common manifold . One may form the hom-bundle , whose fiber is the Banach space of bounded linear operators from to . This is a Banach bundle when is topologized with the operator norm.
Definition 16.
Let be the connection Laplacian for a compatible connection on a smooth Hilbert bundle over a compact orientable manifold . A heat kernel for is a continuous section of the Banach bundle over that satisfies the following conditions.
-
1.
is with respect to , and is with respect to .
-
2.
satisfies the heat equation , where means applying the Laplacian to .
-
3.
satisfies the boundary condition for every smooth section , where
Lemma 2.
(Heat Kernel for Hilbert Bundles) Let be the connection Laplacian associated to a Hilbert bundle, . Let , be a cutoff function, and
The following hold:
-
1.
The Laplacian admits a unique heat kernel .
-
2.
There exist smooth sections such that for every , the kernel
is asymptotic to , in the sense that
-
3.
The leading term is equal to the parallel transport with respect to the Fréchet connection associated to along the unique length-minimizing geodesic joining and .
Proof.
(Sketch) We note that parametrix-based approach of Berline, Getzler, and Vergne [15] extends to our setting with minor but judicious modifications. As the original parametrix argument is quite lengthy, we simply make note of the necessary modifications to their argument. First, all integration must be understood as Bochner integration. Second, to avoid ambiguities surrounding the algebraic and geometric tensor bundle, the hom-bundle is used instead of tensor bundles.
Now note that the parametrix argument is fundamentally local in nature. Consider a smooth Hilbert bundle , and note that we require an associated Fréchet connection to be metric-compatible. Then, in a a local trivialization , with a separable Hilbert space, the connection has the form with a smooth valued 1-form by Proposition 2. The connection Laplacian is a second-order elliptic operator with scalar principal symbol and lower-order coefficients in the Banach algebra . The parametrix argument then proceeds via solving transport equations along geodesics and then correcting the resulting approximated kernel by a Volterra series. These steps do not rely in any essential way on finite-dimensionality of the fiber, but only on the fact that the coefficient algebra admits the usual smooth calculus and operator-norm estimates. Thus, replacing matrix-valued coefficients by -valued ones, one obtains in the same way a smooth kernel , and the usual energy argument gives uniqueness.
∎
Remark 7.
The details of the necessary parametrix argument may be found in 2.1 – 2.5 of [15]. Note that while they assume a finite-rank hypothesis through the entirety of chapter two, the hypothesis is actually unused until section 2.6 of their work, when the operator is required to be Hilbert-Schmidt.
G.4 Borel Functional Calculus
Given a linear map and a suitably well-behaved function , one may “apply ” to to get a new linear map . In particular, whenever is analytic with a globally defined Maclaurin expansion , one may define , where is interpreted as the -fold composition . When is a bounded linear endo-operator on a Hilbert space, one may similarly define via series expansion. However, when is unbounded, more care must be taken to handle series convergence. This difficulty in the unbounded case is pertinent for the HilbNet architecture, where the convolution filter must be applied to the unbounded connection Laplacian . The Borel functional calculus provides an elegant solution. While traditionally formulated through the spectral theorem and projection-valued measures, for the purpose of the HilbNet architecture, the following version (Theorem VIII.5 of [82]) will be sufficient.
Theorem 3.
(Spectral Theorem - Functional Calculus Form) Let be a self-adjoint operator on a Hilbert space . Then there is a unique map from the bounded Borel measurable functions on into the space of bounded linear operators on , , so that
-
•
is an algebraic *-homomorphism.
-
•
is norm continuous, that is, .
-
•
Let be a sequence of bounded Borel functions with for each and for all and . Then, for any , .
-
•
If pointwise and if the sequence is bounded, then strongly.
In addition:
-
•
If , then .
-
•
If , then .
G.5 Cellular Sheaves and Sheaf Laplacians
Cellular sheaves on graphs are a data structure that generalizes weighted graphs. We take our exposition of cellular sheaves and their Laplacians primarily from [52]. See [52, 29, 53] for more details.
Definition 17 (Cellular Sheaf on a Graph).
Let be an undirected multi-graph without self-loops, and finitely many vertices and edges. Let denote that node is incident to the edge . A cellular sheaf, or equivalently network sheaf, on consists of the following data.
-
•
A vector space for each , called the stalk over .
-
•
A linear map for each incident pair , called the restriction map of into .
Remark 8.
At the level of category theory, a cellular sheaf is a functor , where the graph is viewed as a posetal category with objects , and a unique homomorphism from whenever . In this light, we adopt the notation for a cellular sheaf on a graph .
Traditionally, to add geometric content to a cellular sheaf, one passes to weighted cellular sheaves: a cellular sheaf where each stalk is a finite dimensional vector space endowed with an inner product . To accommodate infinite-dimensional Hilbert space stalks, we instead follow the approach of [44].
Definition 18.
(Hilbert Cellular Sheaf on a Graph) A Hilbert cellular sheaf on a finite graph consists of the following data.
-
•
A Hilbert space for each , referred to as the node stalk over .
-
•
A Hilbert space for each , referred to as the edge stalk over .
-
•
For each edge with bounding vertices , a pair of bounded linear restriction maps and .
Remark 9.
A bounded Hilbert sheaf can again be viewed as a functor , where is the graph viewed as an acyclic category, and is the category of real Hilbert spaces and bounded globally-defined linear operators.
Remark 10.
In order to better differentiate between the usual finite-rank cellular sheaf on a graph and the potentially infinite-rank Hilbert cellular sheaves considered above, we use the terminology of network sheaves when we wish to emphasize the finite-rank consideration.
Definition 19.
Let be a bounded Hilbert sheaf on a graph . The spaces of 0-cochains and 1-cochains are defined by:
where denotes the direct sum of Hilbert spaces. For a 0-cochain , we denote the component of in the stalk over the node by , with a similar notation for components of 1-cochains.
Definition 20.
Let be a graph. A signed incidence relation on is a pairing which satisfies the following conditions:
-
1.
if and only if .
-
2.
For each , .
Remark 11.
The data of a signed incidence structure on is equivalent to the choice of a source and target for each edge . In particular, the total set of incidences can be put into two-to-one correspondence with edges, counting the two distinct “boundings” of each .
Definition 21.
Let be a graph equipped with a signed incidence relation. Let be a bounded Hilbert sheaf on . The coboundary operator is the operator with image on the each edge stalk:
Remark 12.
The coboundary map depends on the choice of the signed incidence relation. However, given two signed incidence relations , the corresponding coboundary operators differ on the stalk by at most a sign difference . In particular, does not depend on the choice of .
Definition 22.
Let be a bounded Hilbert network sheaf on a graph equipped with a signed incidence relation. Let denote the linear adjoint of the corresponding coboundary operator with respect to the inner product structures on the spaces of cochains as product spaces. The Hilbert sheaf Laplacian is the operator defined by the composition:
Proposition 5.
The Hilbert sheaf Laplacian has the following properties.
-
1.
The Laplacian is a self-adjoint globally-defined bounded linear operator.
-
2.
When is viewed as a Hilbert complex (in the sense of [23]) the kernel recovers the space of harmonic -cochains.
-
3.
The negative Laplacian is the infinitesimal generator of a strongly continuous semigroup on . For a choice of initial cochain , the resulting flow is a solution to the sheaf heat equation . Moreover, the flow has limiting behavior , where denotes the orthogonal projection onto .
Proof.
See [44]. ∎
We now recall our construction of Hilbert cellular sheaf from a spatially-discretized Hilbert bundle.
Definition (Hilbert Cellular Sheaf from a Hilbert Bundle).
For a given Hilbert bundle with sampled points , fix a geodesic between and , for all . Further, let denote the midpoint of this geodesic. Consider the graph with an undirected edge between and , for each . The associated Hilbert cellular sheaf on with bandwidth parameter is given by the following assignments:
-
•
The Hilbert space for each , referred to as the node stalk over .
-
•
The Hilbert space for each , referred to as the edge stalk over .
-
•
For each edge with bounding vertices , a pair of bounded linear restriction maps
(82) where , with the geodesic distance on , and denotes the unitary parallel transport map on between and .
Remark 13.
We make a few clarifying remarks on this construction.
-
•
Note that geodesics exist in this setting by the Hopf-Rinow theorem [36] by compactness of , and we should further choose length-minimizing geodesics.
-
•
For simplicity, we use the geodesic distance to weight our restriction maps. However, we could also use the Euclidean heat kernel ala [14], and this would result in a reweighted sheaf Laplacian but would ultimately converge to the same connection Laplacian. In practical implementations, it is thus well-justified to work with the Euclidean heat kernel rather than geodesic distance based weights.
-
•
While this particular construction is chosen to emphasize the relationship to [14] and allow for the necessary analytical arguments, there exist alternative constructions that are geodesic choice-independent that generate the same sheaf Laplacian but emphasize functoriality.
G.6 Empirical Laplacians
Analogously to [14], we introduce two intermediary notions of Laplacian that interpolate between the Hilbert sheaf Laplacian and the Laplacian on a Hilbert bundle. For this section, fix a Hilbert bundle with compatible Fréchet connection over a closed manifold .
Definition 23.
Consider the unique normalized volume pseudo-form on (or the usual volume form if is orientable), denoted . Thus, equips with a probability measure and we may refer to the resulting distribution as the uniform distribution on .
Henceforth, let denote the realization of an iid random sample drawn from the uniform distribution on . We then recall the following construction.
Definition.
(Point-Cloud Extension of Sheaf Laplacian) Let be a Hilbert bundle and consider a sample . Then the corresponding Hilbert sheaf Laplacian may be extended to the point-cloud Laplacian , an operator on via
| (83) |
Remark 14.
We make the following remarks about the point-cloud Laplacian.
-
1.
The point-cloud Laplacian is the extension of the Hilbert sheaf Laplacian to an operator acting on sections of the Hilbert bundle , normalized by a factor of . In particular, when evaluated at a sample point , the point cloud Laplacian is exactly the normalized component of the Hilbert sheaf Laplacian evaluated at the cochain .
-
2.
The point-cloud Laplacian is well defined for any section , regardless of regularity.
Definition 24 (Functional Approximation Laplacian).
For a section , we define the functional approximation to the connection Laplacian
where denotes Bochner integration with respect to the canonical normalized volume pseudo-form on .
Remark 15.
We make the following remarks about the functional approximation Laplacian.
-
1.
The functional approximation Laplacian has no dependence on a sample of points from the underlying manifold. Instead, the functional approximation may be treated as the limiting operator where all points on the manifold have been sampled, and contribute uniformly via parallel transport.
-
2.
The geometric data of the connection impacts the functional approximation Laplacian through the parallel transport maps , which links the fibers of .
-
3.
Viewing the sample as having been drawn iid from the uniform probability distribution on , the functional approximation Laplacian can be identified pointwise on a section as the expected value of the point cloud Laplacian. That is, for any and , we have:
Appendix H Proofs of Results
H.1 Auxiliary Lemmas for Theorem 1
Lemma 3.
For , , and , the following Gaussian identities hold:
| (84) | ||||
| (85) | ||||
| (86) |
where is the Kronecker delta.
Proof.
Notice that is the density function of a multivariate normal random variable . Equations (84) and (85) are simply the values of the coordinate-mean and covariance . Finally, we may write , where is a standard multivariate normal (zero-mean, uncorrelated, unit variance) in dimensions. We may write , which confirms (86). ∎
Remark 16.
If instead of integrating over the entire domain , we integrate over a symmetric ball centered at zero, we recover the following augmented Gaussian identities:
| (87) | ||||
| (88) | ||||
| (89) |
The restriction to the ball leaves all odd-degree symmetries unchanged, and augments the even symmetries by a factor of the form , capturing the exponential decay on probability mass far away from the origin.
Lemma 4.
(Banach Mean Value Theorem) Consider Banach spaces and some open Then if is Gateaux differentiable, then the mean value theorem holds in the sense that
whenever the convex hull lies in .
Proof.
The proof is a standard functional analysis argument but we recall it here for completeness. Let be the one-dimensional subspace spanned by some fixed nonzero . Consider as a continuous linear functional on with norm . By the strong form of the Hahn-Banach, as stated in [3], for instance, we may extend this functional to the whole domain. Then note that , so the result follows by considering . ∎
Remark 17.
We recall that Fréchet differentiability in particular implies Gateaux differentiable, which will be sufficient for our purposes.
Lemma 5.
(Banach Weak Law of Large Numbers) Let denote an independent identically distributed collection of random variables , where is a probability space and is a Banach space. Let denote the partial sum of the first random variables. As , the normalized sequence converges in probability to the mean . That is for all ,
H.2 Key Lemmas for Theorem 1
Lemma 6.
(Taylor Series for Hilbert Signals) Let be a Hilbert bundle equipped with a Fréchet connection. For a given signal , the space of -times continuously Fréchet-differentiable sections, and , consider any in a geodesic ball of and fix a length-minimizing curve from to . Let denote the parallel transport of from back to along . As , we have that
Proof.
We first establish for a section of along , that:
We compute directly from the definition:
| (90) | ||||
| (91) | ||||
| (92) |
where in the second line we used the composition law for parallel transport, . which follows from uniqueness of the parallel transport ODE, together with the fact that is a bounded linear operator and hence may be factored outside the limit.
Recall that is a curve in the fixed Hilbert space . Iteratively applying the previous derivative computation to , where denotes -fold composition, yields:
Evaluating at , where :
Finally, applying Taylor’s theorem for Banach spaces [25] yields the desired asymptotic statement. ∎
Lemma 7.
Let and denote the point-cloud and functional Laplacian operators with bandwidth . We have the concentration inequality:
for some which depends only on the choice of section . Consequently, we have the following limit in probability as :
Proof.
The point cloud Laplacian may be viewed as the sample average of iid Hilbert-space valued random variables:
Moreover, the functional approximation may be viewed as the expectation with respect to the uniform probability measure on . The bundle has separable fibers, so the results of [80] apply. We may recover a Hoeffding inequality:
where is the maximum norm of the section over the compact manifold . Setting and identifying yields the desired concentration inequality. Convergence in probability follows immediately. ∎
Lemma 8.
Let be a Hilbert bundle on a closed Riemannian manifold equipped with a Fréchet connection. Let be open, and . Fix a section . For each , let denote the parallel transport of along the designated geodesic connecting to . For any real , the following asymptotic bound holds as :
Proof.
We note this is a modified version of Lemma 4.1 of [14]. Let , , and , where is the canonical measure with respect to the volume pseudo-form. Since is compact and is closed, the infimum distance . Recalling that is unitary, and hence for all , we may bound:
∎
Lemma 9.
Let be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian , and functional approximation with bandwidth . Fix a section . For any as the bandwidth , we have pointwise convergence:
Proof.
Let , and consider the scaled functional approximation which acts on a section at point by:
Let denote a sufficiently small ball containing . By Lemma 8,
Parameterize via geodesic coordinates such that . Let . Let and denote the section and ball in coordinates. In these coordinates, we may write:
In geodesic coordinates, since the closed manifold has bounded Ricci curvature, the metric tensor has an asymptotic expansion given by (as in e.g. [36])
This approximation and the identification in coordinates allows us to express:
Let . This expression splits as , where:
We first analyze the limiting behavior of as . Within our geodesic coordinates centered at , we may further work with a local synchronous frame of along these coordinates such that it is parallel along all radial geodesics. Consequently, within this frame, ordinary derivatives of coincide with covariant derivatives of at the basepoint:
where is half the bundle curvature of arising from the connection . Hence by Lemma 6, the Taylor expansion of at is
with .
Using the augmented Gaussian identities (87), (88), and (89), we may compute:
where is the Kronecker delta. Since is antisymmetric, , hence
Inserting into the definition of yields
Thus we recover the connection Laplacian by Lemma 1 .
To analyze the quantity , we first observe that inside of , the parallel transport map varies smoothly in , with bounded derivatives. Since the section is , the mean value theorem (Lemma 4) ensures there is a such that for all . Utilizing this Lipschitz bound and the augmented Gaussian identities, we may compute:
Hence as . Combining with the analysis of yields:
∎
Lemma 10 (Variance asymptotics).
Let be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian . Fix a section . Consider a random sample of -points with respect to the normalized volume pseudo-form, . For each bandwidth , let , where . Define an error term:
There is a constant , which depends on the section , such that the following asymptotic estimate holds as :
Proof.
Let be the random section given by
Note that is stochastic only through the sample point , and that are independent when . It is straightforward to verify by Funbini’s theorem that
Set . By Fubini, we may exchange the order of integration and find
We may compute:
By standard Gaussian identities, the remaining Gaussian integral is as , with constant independent of . Recalling the definition of in terms of , we recover that
Since the constants are all independent of , we may integrate over and find that
as well. The result immediately follows. ∎
Lemma 11 (Bias asymptotics).
Let be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian . Fix a section . Consider a random sample of -points with respect to the normalized volume pseudo-form, . For each bandwidth , let , where . Define an error term:
There is a constant , which depends on the section , such that the following asymptotic estimate holds as :
Proof.
This follows from essentially repeating the analysis in the proof of Lemma 9 using the fourth order Taylor expansion in terms of the covariant derivative. In particular, after accounting for the fourth-order Taylor remainder, we find that:
The result follows. ∎
H.3 Proof of Theorem 1
H.3.1 Proof of Theorem 1A
Theorem.
Let be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian . Fix a section . Consider a random sample of -points with respect to the normalized volume form, . Let be the associated Hilbert cellular sheaf with bandwidth and associated Point cloud Laplacian . Then, we have that in probability, for any ,
with bandwidth , .
Proof.
Let , and consider the scaled functional approximation which acts on a section at point by:
We may bound:
as we have . Hence the second quantity on the right hand side goes to zero by Lemma 9. On the other hand, the first quantity on the right hand side can be bound by the concentration inequality of Lemma 7, yielding:
where is a constant depending on the section . Since as , the concentration upper bound goes to zero as as well. This completes the proof of the of the main theorem. ∎
H.3.2 Proof of theorem 1B
Theorem.
Let be a Hilbert bundle equipped with a compatible Fréchet connection, with associated Laplacian . Fix a section . Consider a random sample of -points with respect to the normalized volume form, . Let be the associated Hilbert cellular sheaf with bandwidth and associated Point cloud Laplacian . Then, we have the following convergence in expectation:
with bandwidth , .
H.4 Key Lemmas for Theorem 2
Definition 25.
Let be a smooth Hilbert bundle over a manifold . A finite rank approximating sequence for is a sequence of smooth sub-bundles with the following properties:
-
1.
For each , has finite rank;
-
2.
For each , the bundle is a sub-bundle of ;
-
3.
For each , we have that .
Lemma 12.
Let be a smooth Hilbert bundle with infinite-dimensional fibers over a compact manifold . A finite rank approximating sequence exists.
Proof.
By Kuiper’s Theorem [61], the unitary group of the typical fiber is contractible, implying that there exists an isomorphism of with bundle at the level of purely topological bundles. Now note that every Hilbert bundle admits a finite rank approximating sequence, by considering a Hilbert space basis for , and defining . The sequence can then be seen to be a finite rank approximating sequence. Furthermore, because the base space is a finite-dimensional manifold, this topological trivialization can be upgraded to a smooth global trivialization [75]. Let be such a smooth isomorphism. Thus, the finite rank approximating sequence pulls back to a finite rank approximating sequence on by , as desired. ∎
Lemma 13.
Let be a smooth infinite-dimensional Hilbert bundle over a compact manifold equipped with a compatible connection . Let be a finite rank approximating sequence for . The data of induces a compatible connection on , where denotes the fiber-wise orthogonal projection onto .
Proof.
This follows immediately from the fact that is compatible and orthogonal projections are self-adjoint. ∎
Remark 18.
Let be an infinite-dimensional Hilbert bundle equipped with a compatible connection. By the previous lemmas, we may always find a finite rank approximating sequence , each with compatible connection. These compatible connections induce connection Laplacians on each sub-bundle. Moreover, Theorem 1B applies to each Laplacian .
We restate and prove Proposition 1.
Proposition.
Let be a Hilbert bundle, with strictly infinite-dimensional generic Hilbert-space fiber . Fix an orthogonal basis of and let . Then there exists a smooth map of bundles
| (93) |
where is a -dimensional vector bundle with generic fiber and at each , recovers the usual orthogonal projection map.
Proof.
Definition 26.
Let be a smooth Hilbert bundle over a closed manifold of dimension equipped with a compatible connection . Fix a section . Let be a finite rank approximating sequence for with induced connections , connection Laplacians , and bandwidth point cloud Laplacians associated to an iid sampling . Let denote the fiber-wise orthogonal projection map onto . The discretization error and the continuous geometry error are the quantities:
Remark 19.
The discretization error captures the error introduced by approximating the connection Laplacian by a point-cloud Laplacian on points. The continuous geometry error is a deterministic quantity that captures the error introduced by moving to the sub-bundle .
Lemma 14.
The continuous geometry error converges to zero as .
Proof.
Without loss of generality, by pushing through the global trivialization of Kuiper’s theorem, we may assume without loss of generality that as topological bundles. First, note that the orthogonal projection commutes with the Fréchet derivative on sections, in the sense that:
By Proposition 2, for any vector field , we may write:
where is a globally bounded linear operator acting on the fiber . We may now compute:
Since in the strong operator topology and , we have that as . Similarly since is bounded, we find that for each , we have . Therefore . By a similar argument, we may conclude that for a pair of vector fields , that Hence using the coordinate form of the connection Laplacian of Lemma 1, we recover that:
for all .
We now upgrade to this statement to convergence by the dominated convergence theorem. If there is a global bound such that for all , we may apply the dominated convergence theorem and conclude that . To find such a , we first observe that is a continuous section, and hence is bounded on the compact manifold . Next, for a pair of vector fields , we may use the fact that and the representation to find a -independent bound on . The local coordinate form the connection Laplacian again allows us to conclude that there is a bound for all and . The triangle inequality finally allows us to bound:
This completes the proof. ∎
H.5 Proof of Theorem 2
Theorem.
Let be an infinite-dimensional Hilbert bundle over a closed manifold of dimension equipped with a compatible connection . Fix a section . Let be a finite rank approximating sequence for with induced connections , connection Laplacians , and bandwidth point cloud Laplacians associated to an iid sampling . Let denote the fiber-wise orthogonal projection map onto . There exists a deterministic increasing sequence , depending on the section , such that
with bandwidth , .
Proof.
Let . We may easily bound the expected global error in terms of the continuous geometry error and the expected discretization error:
By applying Theorem 1 to the bundle , the expected discretization error as . On the other hand, Lemma 14 ensures that as .
To construct the diagonal sequence, first choose an increasing sequence such that for all . Next, choose an increasing sequence such that for each . For each , set . Observe that since . On the other hand, as well. Therefore setting yields a diagonal sequence with the property that
This diagonal subsequence is deterministic as both the continuous geometry error and the expected discretization error are deterministic. ∎
H.6 Key Lemmas for Corollary 2.1
Lemma 15.
Let be a smooth infinite-dimensional Hilbert bundle over a closed manifold of dimension equipped with a compatible connection . Fix a section . Let be a finite rank approximating sequence for with induced connections , connection Laplacians , and bandwidth point cloud Laplacians associated to an iid sampling . Let denote the fiber-wise orthogonal projection map onto . Let be the diagonal sequence induced by Theorem 2. Let with bandwidth , . Similarly, let . Finally, let be a bounded continuous function.
Under the Borel functional calculus (Appendix G.4), we have MSE convergence:
Proof.
Observe that and each are self-adjoint unbounded operators on . We begin by showing there is a common core and a subsequence for which in .
Take to be any countable dense subset of . Since has separable fibers, such a countable dense subset necessarily exists. Moreover, and are all defined on . This shall be our common core.
Treat as an -valued random variable. For each , Theorem 2 ensures that in mean square error. It immediately follows that in probability with respect to the measure from which the sampling is drawn.
Enumerate . Since convergence in probability implies almost sure convergence on a subsequence, we may inductively construct a doubly-indexed sequence of indices such that the following properties hold:
-
1.
For each , the sequence is a subsequence of ;
-
2.
Along the sequence , we have almost sure convergence as .
Take the diagonal sequence . Along , we have that as for all . Now applying Theorems VIII.25(a) and VIII.20(b) of [82], we may conclude that:
almost surely for each .
Notice that the previous argument can not only be applied to the sequence , but also along any subsequence of . Since any subsequence of therefore has an almost surely convergent sub-subsequence, we may conclude that for the original sequence , for each section (not necessarily in ), we have convergence in probability:
in probability with respect to the sampling as .
Finally, by the spectral calculus, since is bounded by some , we may bound . Similarly, . It follows that for each ,
for all . Hence the dominated convergence theorem admits an MSE upgrade to the desired conclusion:
∎
H.7 Proof of Corollary 2.1
Corollary.
Under the hypotheses of Theorem 2, let be the constructed deterministic diagonal sequence, and . Let be a fiber-wise nonlinearity that is -Lipschitz in the corresponding fiber norms. For bounded continuous filters , consider the continuous and sampled architectures:
with initializations and , where and .
Then, the output of the discrete architecture converges in mean square to the output of the continuous architecture:
Proof.
We proceed by induction on the layers. Let and denote the signal error and spectral filter error at layer respectively:
Let . By the Lipschitz continuity of the nonlinearity , the triangle inequality yields the pathwise recursive bound:
Iterating this inequality over layers expands to:
By Theorem 2, the initialized signal error in mean square. Furthermore, by Lemma 15, in mean square as well for each .
Taking the expectation of the squared recursive bound and applying the Cauchy-Schwarz inequality to the finite sum isolates the individual mean square limits. Hence, as , we have that the total error satisfies:
In particular, we have that as sampling density goes to infinity, in MSE ,
∎
H.8 Proof of Corollary 2.2
Corollary.
Let be a smooth Hilbert bundle over a closed manifold of dimension equipped with a compatible connection . Fix a section . Let be a finite rank approximating sequence for with induced connections , and connection Laplacians . Let denote the fiber-wise orthogonal projection map onto . Let and be a pair of independent iid samplings of points on the manifold. Denote the bandwidth point cloud Laplacians associated to these distinct samplings by and respectively. Let be a diagonal sequence such that the conclusion of Theorem 2 holds for both samplings. Let with bandwidth , , and similar for .
Let be a network depth, and be a fiber-wise nonlinearity that is -Lipschitz in the corresponding fiber norms. For bounded continuous filters , consider the continuous and sampled architectures:
with initializations and . Under these hypotheses and notation, one may obtain a MSE convergence result:
Further, one may derive a quantitative bound for the disagreement in terms of sample-indpendent quantities.
Proof.
One may bound:
Applying Corollary 2.1 to each sampling separately yields the MSE convergence. In particular, we have that
To derive a quantitative bound, introduce the per-sampling signal error and spectral filter error at level by:
Apply the triangle inequality and the layer-wise recursive bounds of the proof of Corollary 2.1 to establish:
The level-zero signal error is sampling independent, and bounded above by . On the other hand, we may further apply the Borel functional calculus to bound each spectral filter error by:
We hence conclude a sample-independent bound:
∎
Comments
· 0