UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration
Abstract
Multivariate time series forecasting is fundamental to numerous domains such as energy, finance, and environmental monitoring, where complex temporal dependencies and cross-variable interactions pose enduring challenges. Existing Transformer-based methods capture temporal correlations through attention mechanisms but suffer from quadratic computational cost, while state-space models like Mamba achieve efficient long-context modeling yet lack explicit temporal pattern recognition. Therefore we introduce UniMamba, a unified spatial-temporal forecasting framework that integrates efficient state-space dynamics with attention-based dependency learning. UniMamba employs a Mamba Variate–Channel Encoding Layer enhanced with FFT-Laplace Transform and TCN to capture global temporal dependencies, and a Spatial Temporal Attention Layer to jointly model inter-variate correlations and temporal evolution. A Feedforward Temporal Dynamics Layer further fuses continuous and discrete contexts for accurate forecasting. Comprehensive experiments on eight public benchmark datasets demonstrate that UniMamba consistently outperforms state-of-the-art forecasting models in both forecasting accuracy and computational efficiency, establishing a scalable and robust solution for long-sequence multivariate time-series prediction. The code is available at https://github.com/XsChen524/unimamba-ts
I Introduction
Time-series forecasting plays a critical role in a wide range of real-world applications, including energy demand prediction [1], financial forecasting [2], traffic management [3, 4], and environmental monitoring [5]. The task remains inherently challenging due to complex temporal dependencies [6], multi-variate correlations [7], and non-stationary dynamics [8] commonly observed in practical data. As forecasting horizons grow longer and data modalities become more heterogeneous, building models that can efficiently capture both global temporal patterns and cross-variate relationships has become increasingly vital.
Early approaches relied on statistical and recurrent models such as ARIMA [9], LSTM [10], and GRU [11], which primarily captured short-term dependencies but struggled with scalability and stability over extended sequences. The advent of the Transformer [12] introduced a new era for sequence modeling by leveraging self-attention to capture long-range interactions. Transformer-based forecasting frameworks [13, 14, 15] have demonstrated impressive accuracy; however, their quadratic time complexity and high memory cost severely limit scalability for long and high-dimensional time series. Attempts to linearize attention [16, 17] mitigate efficiency issues, but often lead to degraded temporal precision and reduced robustness.
Recently, state-space models (SSMs) [18] have reemerged as an efficient and theoretically grounded alternative. The Mamba architecture [19] and its variants [20] introduce selective scanning mechanisms and continuous-time dynamics, offering superior scalability in handling long sequences. Despite their success, existing Mamba-based frameworks [19, 21, 22] mainly focus on one-dimensional temporal dynamics and overlook explicit spatial or cross-variate dependencies [7, 23], which are essential in multivariate forecasting [24].
To bridge these two paradigms, we propose UniMamba, a unified spatial-temporal forecasting framework that combines frequency analysis and dependencies capture in bidirectional Mamba variant with expressive power of attention mechanisms. UniMamba consists of a Mamba Variate–Channel (VC) Encoding Layer enhanced with Fast Fourier Transform(FFT)-Laplace reconstruction and Temporal Convolution Networks (TCN) for dynamic feature propagation, followed by a spatial temporal attention Layer to jointly model inter-variates and temporal dependencies. A Feedforward Temporal Dynamics (FFN TD) Layer adaptively fuses global and local temporal contexts before projection, enabling robust long-horizon prediction.
Our primary contributions can be summarized as follows:
-
•
We propose UniMamba, the first unified framework that combines state-space dynamics with spatial temporal attention, effectively leveraging the strengths of both Transformers and Mamba architectures.
-
•
We design an enhanced Mamba Variate–Channel Encoding Layer incorporating FFT-Laplace transform and TCN modules, enabling efficient modeling of complex interdependencies across time and variables.
-
•
We demonstrate that UniMamba achieves state-of-the-art forecasting performance across multiple public datasets, outperforming Transformer, MLP and Mamba-based baselines in terms of accuracy, scalability, and robustness.
By coupling attention-driven spatial temporal modeling with efficient state space recurrent dynamics, UniMamba provides a powerful yet scalable solution for long-horizon multivariate time-series forecasting for production.
II Related Work
II-A Transformer-Based Spatial Temporal Forecasting
Transformer-based architectures [12] have substantially influenced time-series forecasting by enabling long-range temporal dependency modeling through self-attention. Subsequent works have refined this paradigm for temporal tasks, such as employing causal masking [13, 25, 26] to maintain sequence order or using multi-scale hierarchical designs like Pyraformer [14]. Moreover, cross-variate modules such as Crossformer [15] enhance inter-variable interactions through multidimensional attention. Despite these improvements, Transformer variants still suffer from quadratic computational cost and substantial memory overhead, which restrict scalability for long or multivariate sequences. To alleviate this, recent studies explore linearized [16, 17] or patch-based [27] mechanisms, but these often limit global context modeling.
II-B State-Space and Mamba-Based Models
State space models (SSMs) have recently offered an efficient alternative for long-horizon sequence modeling. The Mamba SSM [19] introduces a selective scanning mechanism to capture temporal dynamics in linear time, significantly improving training efficiency and scalability over attention-based models. Variants such as S-Mamba [21] fuses bidirectional Mamba branches that processing forward and flipped sequences to strengthen global pattern recognition. However, existing Mamba architectures primarily focus on propagating or discarding information in sequences and maintaining hidden states precisely, offering limited capacity to decompose and reconstruct temporal signals. These are critical gaps for real-world multivariate forecasting tasks.
II-C Our Contributions: The UniMamba Framework
To overcome these limitations, we propose UniMamba, a unified hybrid framework that seamlessly integrates enhanced Mamba Variate–Channel encoding and Spatial Temporal Attention. UniMamba consist of following components to model temporal signals, capture dependencies and generate precies forecasting results. 1. Enhanced Mamba Variate–Channel Encoding: We incorporate Mamba-based selective scanning with FFT and learnable Laplace inverse transform for temporal signal reconstruction and TCN for local smoothing to build a more expressive and stable sequence representation. 2. Spatial Temporal Attention: Unlike conventional attention models that treat variables and time independently, UniMamba introduces a joint spatial temporal attention layer to strengthen its recognition and expressiveness across both dimensions. 3. Temporal–Dynamic Feedforward Layer: The FFN TD Layer further refines forecasting results by adaptively normalization and feed forwarding tensors before final projection.
Through this unified design, UniMamba combines advantages of both attention and state-space architectures, offering a highly efficient, scalable, and robust framework for multivariate time-series forecasting.
III Methodology
III-A Problem Formulation
Let denote a multivariate time series with time steps and variables. Each vector represents the observations of correlated signals at time step . Given a historical observation window of length , the forecasting objective is to predict the next time steps:
| (1) |
where represents a learnable forecasting model parameterized by , and is the model’s predicted future sequence.
In multivariate forecasting, two major challenges must be addressed: (i) capturing long-range temporal dependencies that span wide contextual horizons, and (ii) modeling spatial (cross-variate) relationships among multiple correlated variables. Conventional recurrent or attention-based models often treat these dependencies separately or require quadratic computation, leading to inefficiency and limited scalability, especially when and become large.
III-B Overall Architecture
UniMamba introduces a unified architecture for efficient and robust multivariate time-series forecasting. As illustrated in Figure 1, input batch is normalized and got its variate and time sequence dimension transposed before embedding, because tokenization along sequence dimension improves model’s understanding quality [7]. Embedded input sequence are fed into bidirectional branches. The embedded sequence in forward branch are transformed to frequency domain via FFT to filter out aperiodic signals and disturbance. Subsequently temporal signal is reconstructed by learnable Laplace transform to represent exponential trends, transients and periodicity across variate channels. TCNs then extract localized and medium-range dependencies, after which tensors are fed to MLP to match dimensions for fusion. While the reserve Mamba branch encodes long-term temporal evolution in backward direction to capture dependencies in future time steps which are neglected by causal convolution. The spatial temporal attention layer adaptively adjusts weight matrices to capture variates and temporal sequence interactions across time. Finally FFN TD block refines latent embeddings before the tensor being projected and de-normalized to output domain.
| (2) |
Formally, the overall forecasting pipeline of UniMamba is expressed as above. This unified formulation enables UniMamba to jointly reconstruct trend and seasonal patterns across temporal signal channels in training, inspect dependencies of various scales, and model spatial and temporal correlations while maintaining computational efficiency and generality in real world scenarios.
III-C Learnable Frequency Domain Encoding
To capture both transients and long-range seasonal patterns, UniMamba employs an FFT and learnable Laplace reconstruction as shown in Algorithm 1. The model performs FFT on -length sequence in each channel and transform signals to frequency domain through , where and are indices of channels and samples in temporal signals. is -th frequency-domain components in -th channel. To capture long-term seasonal patterns in temporal signals and properly handle negative impacts of transient patterns, the model adopts Laplace signal reconstruction implemented with neural network components:
| (3) |
where , , , and are learnable reconstruction parameters trained with input tensors. term contributes to modeling transient dynamics and recovers complex synthesis of periodic signals. To a satisfactory confidence, temporal signals with instantaneous and periodic patterns are reconstructed across given contextual ranges. such signal reconstruction enhances the representation of non-stationary dynamics in discrete temporal sequence. Neural operators are computed from projections during training process with optional low-rank approximation to adjust parameter size level.
III-D Temporal Convolutional Networks
The TCN layer captures local continuity and mid-term dependencies through dilated causal convolutions. The output of -th TCN layer is as follow, where is -th conv kernel, is dilation rate, and is activation function:
| (4) |
TCN layers process reconstructed temporal signals, expanding receptive fields for capturing short and mid-term dependencies and stabilize gradient propagation during long-sequence forecasting, without significantly increasing the model parameter size.
III-E Mamba State–Space Module (SSM) and Fusion
On the reverse branch, Mamba block scans flipped-over series and models temporal evolution backwards to comprehensively consider future information neglected by causal convolutions in forward branch. As in Algorithm 2, the input sequence is flipped to obtain backward flow, ensuring global temporal information is considered with selective scan mechanism of linear complexity. After flipped back, the results of flipped Mamba is linearly fused with forward branch and residual block, providing a complete temporal context: . This architecture combines the recurrence efficiency of state-space models with high quality signal reconstructions and broad receptive fields.
III-F Spatial Temporal Attention Integration
The Spatial Temporal Attention module adaptively captures variable-wise and temporal correlations by computing attention scores across both dimensions. As shown below, is fused input, is weight matrix, is contextual vector, and is attention score matrix:
| (5) |
Thus the reweighed output is . It adjusts contextual embeddings dynamically, allowing the model to prioritize information-rich segments and mitigate noise or redundancy in multivariate inputs.
III-G Feedforward Temporal Dynamics and Projection
A Feedforward Temporal Dynamics (FFN–TD) layer refines the fused representations by modeling residual dependencies and enhancing temporal smoothness. After passing all encoders, the final Projection, Transpose and DeNorm layers project and reshape tensor of features into the target dimensionality, generating the forecasting output .
III-H Model Complexity Analysis
UniMamba’s time complexity originates from projector’s matrix computation and selective scan in two branches, and spatial temporal attention module. When is fed into -dimension model with low-rank for forecasting steps, the complexity is (or with approximation) for signal reconstruction, for TCNs, and for Mamba block. Spatial temporal attention requires time where is the attention dimension. Unlike Transformer-based models with self attention, UniMamba projects into model dimension and has its complexity dominated by the controllable parameter , thereby avoiding the quadratic dependency . The low-rank also helps maintain linear complexity given increasing output length in long-term forecasting task. All above makes UniMamba particularly well-suited for long-term forecasting and real-time inference on multivariate data streams.
IV Experiments
We conduct experiments to answer the following research questions: 1) How does UniMamba matches or outperforms current outstanding baselines methods in terms of overall performance. 2) What role does each individual component of UniMamba play in contributing to overall effectiveness. 3) In terms of computational efficiency, how does UniMamba compare with leading forecasting models. 4) To what extent UniMamba maintains robustness under noise insertion. 5) How does altering the lookback window length influence UniMamba’s long-range forecasting accuracy relative to other models? 6) How effectively can UniMamba identify and represent transient behaviors and temporal structures, including challenging or edge-case scenarios and known limitations?
IV-A Experimental Setup
Datasets: We evaluate our approach using eight publicly available benchmark datasets, namely ETTh1, ETTh2, ETTm1, ETTm2, Exchange, Weather, Solar-Energy, and PEMS08, as summarized in Table I. These datasets span multiple application domains and exhibit diverse temporal and structural properties. MSE and MAE are used as evaluation metrics.
Baselines: We compare UniMamba with nine leading forecasting models that collectively represent three primary architectural families: Transformer-based (six methods), MLP-based (two methods), and state-space model (SSM)-based (one method). The Transformer family includes Autoformer [28], which combines time-series decomposition with an autocorrelation mechanism to capture periodic patterns without relying on standard self-attention; FEDformer [17], which replaces standard attention operations with frequency-domain representations to achieve greater efficiency while retaining a broad receptive field; and Crossformer [15], which adopts multi-dimensional attention applied to patched subsequences, enhancing local feature learning though its performance may taper off for very long horizons. DLinear [16] demonstrates that a pair of lightweight linear layers operating on decomposed trend and residual series can rival more intricate attention models across diverse tasks. PatchTST [27] relies on segmented, channel-wise embeddings to extract temporal cues across multiple scales, while iTransformer [7] inverses the standard attention layout to emphasize relationships among variables, though its flat MLP-based tokenization struggles to represent hierarchical time dependencies. The MLP-based group consists of TimesNet [29], which maps one-dimensional sequences into two-dimensional periodic tensors to jointly learn intra- and inter-period patterns. And TiDE [30], which structures stacked fully connected layers in an encoder–decoder arrangement, discarding both attention and recurrence while maintaining strong temporal modeling capacity. Lastly, the SSM-based model S-Mamba [31] utilizes per-variable tokenization combined with bidirectional Mamba modules to represent variable interactions, further strengthened by feed-forward layers that capture temporal transitions.
| Dataset | Variables | Total Time Steps | Sampling Interval |
| ETTh1 | 7 | 17,420 | 1 hour |
| ETTh2 | 7 | 17,420 | 1 hour |
| ETTm1 | 7 | 69,680 | 15 minutes |
| ETTm2 | 7 | 69,680 | 15 minutes |
| Exchange | 8 | 7,588 | 1 day |
| Weather | 21 | 52,696 | 10 minutes |
| Solar Energy | 137 | 52,560 | 1 hour |
| PEMS08 | 170 | 17,856 | 5 minutes |
IV-B Effectiveness
We conduct effectiveness experiments on UniMamba with input sequence and forecast horizons . Table II presents comparisons across four datasets. UniMamba consistently secures the best or second-best MSE and MAE scores for most forecasting horizons on given datasets, demonstrating superior effectiveness compared to state-of-the-art baselines such as S-Mamba and iTransformer. Comprehensive effectiveness experimental results and optimal hyperparameter settings are listed in appendix.
The superior performance of UniMamba can be traced to its unified spatial temporal modeling design. First, the incorporation of FFT and learnable Laplace reconstruction modules enable accurate signal reconstruction and spectral representation, capturing periodic frequency-domain patterns and divergent trend without huge overheads. Second, TCN stack enhances local contextual modeling by capturing short-term temporal dependencies missed by purely spectral methods. Third, Mamba block models sequential dynamics in backward directions, enabling the framework to learn temporal correlations in future time steps ignored by causal convolutions. Additionally, spatial temporal attention module adaptively integrates spatial correlations and temporal dependencies, allowing the model to focus on dynamic inter-variable relationships even in non-stationary settings. FFN TD and the final projection layers ensure stable feature refinement and precise mapping to prediction space.
These design choices collectively balance global and local temporal dynamics, enhance generalization across diverse domains, without involving enormous computational costs. This explains UniMamba’s excellent and consistent prediction accuracy across different datasets and forecast horizons.
| Models | UniMamba | S-Mamba | iTransformer | PatchTST | Crossformer | TiDE | TimesNet | DLinear | FEDformer | Autoformer | |||||||||||
| Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
| ETTm2 | 96 | 0.174 | 0.257 | 0.179 | 0.263 | 0.180 | 0.264 | 0.175 | 0.259 | 0.287 | 0.366 | 0.207 | 0.305 | 0.187 | 0.267 | 0.193 | 0.292 | 0.203 | 0.287 | 0.255 | 0.339 |
| 192 | 0.240 | 0.302 | 0.250 | 0.309 | 0.250 | 0.309 | 0.241 | 0.302 | 0.414 | 0.492 | 0.290 | 0.364 | 0.249 | 0.309 | 0.284 | 0.362 | 0.269 | 0.328 | 0.281 | 0.340 | |
| 336 | 0.304 | 0.342 | 0.312 | 0.349 | 0.311 | 0.348 | 0.305 | 0.343 | 0.597 | 0.542 | 0.377 | 0.422 | 0.321 | 0.351 | 0.369 | 0.427 | 0.325 | 0.366 | 0.339 | 0.372 | |
| 720 | 0.403 | 0.400 | 0.411 | 0.406 | 0.412 | 0.407 | 0.402 | 0.400 | 1.730 | 1.042 | 0.558 | 0.524 | 0.408 | 0.403 | 0.554 | 0.522 | 0.421 | 0.415 | 0.433 | 0.432 | |
| Avg | 0.280 | 0.325 | 0.288 | 0.332 | 0.288 | 0.332 | 0.281 | 0.326 | 0.757 | 0.610 | 0.358 | 0.404 | 0.291 | 0.333 | 0.350 | 0.401 | 0.305 | 0.349 | 0.327 | 0.371 | |
| ETTh2 | 96 | 0.293 | 0.343 | 0.296 | 0.348 | 0.297 | 0.349 | 0.302 | 0.348 | 0.745 | 0.584 | 0.400 | 0.440 | 0.340 | 0.374 | 0.333 | 0.387 | 0.358 | 0.397 | 0.346 | 0.388 |
| 192 | 0.371 | 0.394 | 0.376 | 0.396 | 0.380 | 0.400 | 0.388 | 0.400 | 0.877 | 0.656 | 0.528 | 0.509 | 0.402 | 0.414 | 0.477 | 0.476 | 0.429 | 0.439 | 0.456 | 0.452 | |
| 336 | 0.416 | 0.429 | 0.424 | 0.431 | 0.428 | 0.432 | 0.426 | 0.433 | 1.043 | 0.731 | 0.643 | 0.571 | 0.452 | 0.452 | 0.594 | 0.541 | 0.496 | 0.487 | 0.482 | 0.486 | |
| 720 | 0.411 | 0.434 | 0.426 | 0.444 | 0.427 | 0.445 | 0.431 | 0.446 | 1.104 | 0.763 | 0.874 | 0.679 | 0.462 | 0.468 | 0.831 | 0.657 | 0.463 | 0.474 | 0.515 | 0.511 | |
| Avg | 0.373 | 0.400 | 0.381 | 0.405 | 0.383 | 0.407 | 0.387 | 0.407 | 0.942 | 0.684 | 0.611 | 0.550 | 0.414 | 0.427 | 0.559 | 0.515 | 0.437 | 0.449 | 0.450 | 0.459 | |
| Weather | 96 | 0.155 | 0.200 | 0.169 | 0.210 | 0.174 | 0.214 | 0.177 | 0.218 | 0.158 | 0.230 | 0.202 | 0.261 | 0.172 | 0.220 | 0.196 | 0.255 | 0.217 | 0.296 | 0.266 | 0.336 |
| 192 | 0.210 | 0.251 | 0.214 | 0.253 | 0.221 | 0.254 | 0.225 | 0.259 | 0.206 | 0.277 | 0.242 | 0.298 | 0.219 | 0.261 | 0.237 | 0.296 | 0.276 | 0.336 | 0.307 | 0.367 | |
| 336 | 0.268 | 0.294 | 0.274 | 0.296 | 0.278 | 0.296 | 0.278 | 0.297 | 0.272 | 0.335 | 0.287 | 0.335 | 0.280 | 0.306 | 0.283 | 0.335 | 0.339 | 0.380 | 0.359 | 0.395 | |
| 720 | 0.349 | 0.346 | 0.353 | 0.348 | 0.358 | 0.347 | 0.354 | 0.348 | 0.398 | 0.418 | 0.351 | 0.386 | 0.365 | 0.359 | 0.345 | 0.381 | 0.403 | 0.428 | 0.419 | 0.428 | |
| Avg | 0.246 | 0.273 | 0.253 | 0.277 | 0.258 | 0.278 | 0.259 | 0.281 | 0.259 | 0.315 | 0.271 | 0.320 | 0.259 | 0.287 | 0.265 | 0.317 | 0.309 | 0.360 | 0.338 | 0.382 | |
| PEMS08 | 12 | 0.075 | 0.176 | 0.076 | 0.178 | 0.079 | 0.182 | 0.168 | 0.232 | 0.165 | 0.214 | 0.227 | 0.343 | 0.112 | 0.212 | 0.154 | 0.276 | 0.173 | 0.273 | 0.436 | 0.485 |
| 24 | 0.102 | 0.202 | 0.104 | 0.209 | 0.115 | 0.219 | 0.224 | 0.281 | 0.215 | 0.260 | 0.318 | 0.409 | 0.141 | 0.238 | 0.248 | 0.353 | 0.210 | 0.301 | 0.467 | 0.502 | |
| 48 | 0.145 | 0.232 | 0.167 | 0.228 | 0.186 | 0.235 | 0.321 | 0.354 | 0.315 | 0.355 | 0.497 | 0.510 | 0.198 | 0.283 | 0.440 | 0.470 | 0.320 | 0.394 | 0.966 | 0.733 | |
| 96 | 0.219 | 0.277 | 0.245 | 0.280 | 0.221 | 0.267 | 0.408 | 0.417 | 0.377 | 0.397 | 0.721 | 0.592 | 0.320 | 0.351 | 0.674 | 0.565 | 0.442 | 0.465 | 1.385 | 0.915 | |
| Avg | 0.135 | 0.222 | 0.148 | 0.224 | 0.150 | 0.226 | 0.280 | 0.321 | 0.268 | 0.307 | 0.441 | 0.464 | 0.193 | 0.271 | 0.379 | 0.416 | 0.286 | 0.358 | 0.814 | 0.659 | |
IV-C Ablation Study
Table III presents the ablation study conducted on ETTm2, Weather, and PEMS08 datasets with consistent hyperparameter settings to investigate the contribution of each key component in the UniMamba framework. The baseline corresponds to the complete UniMamba model, while subsequent variants remove or modify critical modules such as FFT-Laplace reconstruction, TCN, and attention.
From the results, we observe that removing either the signal reconstruction or TCN modules degrades forecasting performance in most of forecasting horizons and datasets. In terms of ETTm2, removing FFT-Laplace increases the average MSE from 0.281 to 0.288, indicating that the frequency-domain analysis and reconstruction provided by the block is essential for capturing periodic and trend patterns. Similarly, eliminating TCN stack causes further degradation (average MSE rising from 0.246 to 0.252 on Weather), confirming the significance of local temporal pattern extraction.
Furthermore, replacing the combination of both signal reconstruction and TCN with MLP leads to huge accuracy drop, highlighting their strong complementarity. Spectral transformations enable long-term dependency modeling while convolutional filters strengthen short-term feature extraction. The attention-related variants also provide valuable insights: disabling Spatial Temporal Attention results in higher error rates, particularly on ETTm2 dataset, underscoring its role in adaptively weighting diverse temporal and spatial interactions. Conversely, self attention provides excellent accuracy in PEMS08 short-term prediction. Such mechanism also improves stability for some longer horizons, proving beneficial in handling non-stationary and multi-variate dependencies with quadratic complexity.
The minimal version that retaining only reverse Mamba block exhibits the worst overall results, validating that UniMamba’s superior performance emerges from the synergy between its signal construction, TCN, and attention mechanisms. Collectively, these findings demonstrate that each component contributes uniquely to UniMamba’s capability in achieving accurate, robust, and efficient spatiotemporal prediction across diverse domains.
| Model | Length | ETTm2 | Weather | PEMS08 | |||
| MSE | MAE | MSE | MAE | MSE | MAE | ||
| Baseline | 96 | 0.174 | 0.257 | 0.157 | 0.202 | 0.076 | 0.176 |
| 192 | 0.241 | 0.303 | 0.210 | 0.250 | 0.102 | 0.202 | |
| 336 | 0.304 | 0.342 | 0.269 | 0.295 | 0.145 | 0.236 | |
| 720 | 0.403 | 0.400 | 0.350 | 0.348 | 0.225 | 0.278 | |
| w/o FFT-Laplace | 96 | 0.177 | 0.261 | 0.161 | 0.205 | 0.081 | 0.184 |
| 192 | 0.250 | 0.311 | 0.212 | 0.251 | 0.115 | 0.221 | |
| 336 | 0.312 | 0.349 | 0.273 | 0.297 | 0.161 | 0.251 | |
| 720 | 0.413 | 0.406 | 0.350 | 0.347 | 0.275 | 0.315 | |
| w/o TCN | 96 | 0.179 | 0.263 | 0.166 | 0.209 | 0.085 | 0.189 |
| 192 | 0.244 | 0.305 | 0.215 | 0.254 | 0.124 | 0.228 | |
| 336 | 0.302 | 0.341 | 0.273 | 0.294 | 0.140 | 0.245 | |
| 720 | 0.405 | 0.400 | 0.353 | 0.348 | 0.221 | 0.304 | |
| w/o FFT-Laplace & TCN | 96 | 0.182 | 0.267 | 0.165 | 0.208 | 0.089 | 0.295 |
| 192 | 0.251 | 0.312 | 0.215 | 0.253 | 0.138 | 0.244 | |
| 336 | 0.312 | 0.349 | 0.275 | 0.297 | 0.169 | 0.272 | |
| 720 | 0.413 | 0.404 | 0.352 | 0.348 | 0.281 | 0.346 | |
| Self Attention | 96 | 0.179 | 0.262 | 0.160 | 0.207 | 0.074 | 0.173 |
| 192 | 0.245 | 0.305 | 0.212 | 0.253 | 0.101 | 0.199 | |
| 336 | 0.310 | 0.345 | 0.270 | 0.297 | 1.070 | 0.804 | |
| 720 | 0.430 | 0.414 | 0.350 | 0.345 | 0.234 | 0.281 | |
| w/o Attention | 96 | 0.177 | 0.260 | 0.155 | 0.200 | 0.076 | 0.176 |
| 192 | 0.246 | 0.306 | 0.209 | 0.251 | 0.104 | 0.204 | |
| 336 | 0.305 | 0.343 | 0.271 | 0.297 | 0.145 | 0.235 | |
| 720 | 0.428 | 0.411 | 0.350 | 0.346 | 0.242 | 0.287 | |
| Minimal | 96 | 0.184 | 0.266 | 0.166 | 0.209 | 0.089 | 0.195 |
| 192 | 0.250 | 0.311 | 0.216 | 0.254 | 0.141 | 0.247 | |
| 336 | 0.311 | 0.349 | 0.273 | 0.295 | 0.195 | 0.287 | |
| 720 | 0.413 | 0.405 | 0.353 | 0.348 | 0.342 | 0.383 | |
IV-D Efficiency
Table IV compares the efficiency of UniMamba with firstclass baselines on ETTm1 and Weather datasets in terms of prediction accuracy and training time. UniMamba achieves the lowest or tied-lowest MSE and MAE while maintaining competitive training efficiency. On ETTm1, it matches PatchTST’s accuracy but reduces training time by 72%. UniMamba surpasses given models with significant gains in precision on Weather and keep the training cost at an competitive low level. These results highlight UniMamba’s balanced trade-off between accuracy and computational cost, achieved through its lightweight state-space formulation, parallelizable frequency-domain modeling and transformation, and spatial temporal attention integration. In summary, UniMamba demonstrates high effectiveness and scalability for real-time forecasting tasks.
| ETTm1 | |||||
| Models | UniMamba | S-Mamba | iTrans | PatchTST | AutoF |
| MSE | 0.324 | 0.341 | 0.342 | 0.324 | 0.526 |
| MAE | 0.362 | 0.371 | 0.377 | 0.362 | 0.488 |
| Training Time | 39.63(ms/it) | 25.02 | 14.54 | 141.12 | 45.79 |
| Change (%) | - | -36.9% | -63.3% | +256.1% | +15.5% |
| Weather | |||||
| Models | UniMamba | S-Mamba | iTrans | PatchTST | AutoF |
| MSE | 0.156 | 0.168 | 0.176 | 0.183 | 0.323 |
| MAE | 0.200 | 0.211 | 0.215 | 0.222 | 0.373 |
| Training Time | 41.49(ms/it) | 30.36 | 20.68 | 158.87 | 46.40 |
| Change (%) | - | -26.8% | -50.2% | +282.9% | +11.8% |
IV-E Robustness
Figure 2 shows the robustness analysis of UniMamba on ETTm2 under increasing noise perturbations with forecast lengths . The figure reports MSE and MAE values with their percentage changes as standard deviation of gaussian noise grows.
Across all horizons, UniMamba remains highly stable. When standard deviation of gaussian noise increases to 0.3, MSE increases by less than 10%, indicating model’s strong resistance to input distortion. Even at 0.5 noise’s standard deviation, the error growth is moderate and does not exceed 20% in long-term predictions. This robustness stems from the combined effect of Mamba SSM capturing global temporal patterns, Laplace Transform, which enhances frequency-domain stability and realistic signal reconstruction, and spatial temporal attention that adaptively adjusts feature weighting under noisy conditions.
IV-F Lookback Length Study
Figure 3 presents the effect of varying lookback window lengths on forecasting accuracy across ETTm1, Weather, and PEMS08. It can be observed that UniMamba attains the lowest MSE values under short and medium-length lookback configurations, reflecting its remarkable ability to adapt to both short-term and long-horizon temporal dependencies. In contrast to Transformer-based baselines, whose performance often exhibits substantial fluctuations or degradation as the lookback length increases, UniMamba maintains steady or even enhanced predictive accuracy, underscoring its capacity for efficient long-range dependency modeling. On ETTm1 and Weather, accuracy improvements tend to converge beyond 192 lookback length, implying that trends and periodic components are effectively captured within this range. On PEMS08, UniMamba continues to benefit from extended historical contexts, highlighting its pattern capture and signal reconstruction capabilities with high-dimensional series. It’s worth noting that MSE increases when lookback length reaches 720, possibly caused by over-comprehensive consideration of all patterns in the lookback sequence and introducing unnecessary information. Overall, these results confirm that UniMamba achieves an optimal utilization of given contextual series by effectively extracting and analyzing temporal patterns inside.
IV-G Case Study
ETTm2 is a dataset of electricity transformer temperatures. It contains periodic patterns of transformer and oil temperatures in varied time cycles, reflecting real industrial conditions. Figure 4 presents prediction comparisons among UniMamba, S-Mamba, iTransformer, PatchTST and ground truth. Subfigures represent temperature curves of HUFL and LUFL nodes across days, generated with varied predicted lengths. It shows that UniMamba closely follows the ground truth, maintaining smooth and accurate trends even for long prediction horizons. Competing models exhibit noticeable deviations and misjudgment of trends, particularly in regions with rapid fluctuations. The strong alignment of UniMamba with ground truth shows its capability of accurate prediction and mutation identification. This stability attributes to functionalities provided by different components in the unified framework. Overall, UniMamba delivers faithful and reliable reconstruction of temporal patterns, demonstrating its potentiality and generality in real industrial production environments.
V Conclusion
This paper presents UniMamba, a unified spatial temporal forecasting framework that combines frequency-domain analysis and reconstruction, temporal convolution, state space modeling, and attention-based fusion in a single architecture. Through the integration of FFT and Laplace reconstruction, reverse Mamba, TCN, and Spatial Temporal Attention mechanism, UniMamba can effectively capture variate and temporal dependencies and forecast time series accurately. Experimental results show that UniMamba matches or surpasses best forecasting models in accuracy and training efficiency, maintaining stability even under noise and varying input conditions. Its harmonious modular design enables deployment in large-scale or real-time time series forecasting scenarios.
In future work, we plan to extend UniMamba to handle irregularly sampled signals and multimodal temporal data while exploring adaptive mechanisms for continual training. These enhancements aim to further strengthen UniMamba’s practicality and generalization in complex, dynamic spatial temporal environments.
References
- [1] K. Muralitharan, R. Sakthivel, and R. Vishnuvarthan, “Neural network based optimization approach for energy demand prediction in smart grid,” Neurocomputing, vol. 273, pp. 199–208, 2018.
- [2] F. Z. Xing, E. Cambria, and R. E. Welsch, “Natural language based financial forecasting: a survey,” Artificial Intelligence Review, vol. 50, no. 1, pp. 49–73, 2018.
- [3] Q. Zhang, X. Gao, H. Wang, S. M. Yiu, and H. Yin, “Efficient traffic prediction through spatio-temporal distillation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, 2025, pp. 1093–1101.
- [4] Q. Zhang, H. Wang, C. Long, L. Su, X. He, J. Chang, T. Wu, H. Yin, S.-M. Yiu, Q. Tian et al., “A survey of generative techniques for spatial-temporal data mining,” arXiv preprint arXiv:2405.09592, 2024.
- [5] A. Kumar, H. Kim, and G. P. Hancke, “Environmental monitoring systems: A review,” IEEE Sensors Journal, vol. 13, no. 4, pp. 1329–1339, 2012.
- [6] X. Chen, R. Zhang, B. Gao, X. He, X. Liu, P. Lio, K.-Y. Lam, and S.-M. Yiu, “Mode: Efficient time series prediction with mamba enhanced by low-rank neural odes,” 2026. [Online]. Available: https://arxiv.org/abs/2601.00920
- [7] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itransformer: Inverted transformers are effective for time series forecasting,” arXiv preprint arXiv:2310.06625, 2023.
- [8] Q. Zhang, H. Wen, M. Li, D. Huang, S.-M. Yiu, C. S. Jensen, and P. Liò, “Autohformer: Efficient hierarchical autoregressive transformer for time series prediction,” arXiv preprint arXiv:2506.16001, 2025.
- [9] R. H. Shumway and D. S. Stoffer, “Arima models,” in Time series analysis and its applications: with R examples. Springer, 2017, pp. 75–163.
- [10] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2016.
- [11] R. Dey and F. M. Salem, “Gate-variants of gated recurrent unit (gru) neural networks,” in 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, 2017, pp. 1597–1600.
- [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- [13] B. Lim and S. Zohren, “Time-series forecasting with deep learning: a survey,” Philosophical Transactions of the Royal Society A, vol. 379, no. 2194, p. 20200209, 2021.
- [14] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dustdar, “Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting,” in International conference on learning representations, 2021.
- [15] Y. Zhang and J. Yan, “Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting,” in The eleventh international conference on learning representations, 2022.
- [16] A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 11 121–11 128.
- [17] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting,” in International conference on machine learning. PMLR, 2022, pp. 27 268–27 286.
- [18] W. Merrill, J. Petty, and A. Sabharwal, “The illusion of state in state-space models,” arXiv preprint arXiv:2404.08819, 2024.
- [19] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
- [20] Q. Zhang, C. Yu, H. Wang, Y. Yan, Y. Cao, S.-M. Yiu, T. Wu, and H. Yin, “Fldmamba: Integrating fourier and laplace transform decomposition with mamba for enhanced time series prediction,” arXiv preprint arXiv:2507.12803, 2025.
- [21] Z. Wang, F. Kong, S. Feng, M. Wang, H. Zhao, D. Wang, and Y. Zhang, “Is mamba effective for time series forecasting?” arXiv preprint arXiv:2403.11144, 2024.
- [22] A. Liang, X. Jiang, Y. Sun, and C. Lu, “Bi-mamba4ts: Bidirectional mamba for time series forecasting,” arXiv preprint arXiv:2404.15772, 2024.
- [23] Z. Li, S. Qi, Y. Li, and Z. Xu, “Revisiting long-term time series forecasting: An investigation on linear mapping,” arXiv preprint arXiv:2305.10721, 2023.
- [24] B. N. Patro and V. S. Agneeswaran, “Simba: Simplified mamba-based architecture for vision and multivariate time series,” arXiv preprint arXiv:2403.15360, 2024.
- [25] J. F. Torres, D. Hadjout, A. Sebaa, F. Martínez-Álvarez, and A. Troncoso, “Deep learning for time series forecasting: a survey,” Big Data, vol. 9, no. 1, pp. 3–21, 2021.
- [26] Y. Zheng, P. Wei, Z. Chen, Y. Cao, and L. Lin, “Graph-convolved factorization machines for personalized recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 2, pp. 1567–1580, 2021.
- [27] X. Huang, J. Tang, and Y. Shen, “Long time series of ocean wave prediction based on patchtst model,” Ocean Engineering, vol. 301, p. 117572, 2024.
- [28] H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in neural information processing systems, vol. 34, pp. 22 419–22 430, 2021.
- [29] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” in The eleventh international conference on learning representations, 2022.
- [30] A. Das, W. Kong, A. Leach, S. Mathur, R. Sen, and R. Yu, “Long-term forecasting with tide: Time-series dense encoder,” arXiv preprint arXiv:2304.08424, 2023.
- [31] Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang, and Y. Zhang, “Is mamba effective for time series forecasting?” Neurocomputing, vol. 619, p. 129178, 2025.
Comments
· 0