arXiv:2604.16325 · cs.LG · uncurated · rendered via ar5iv

UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

Title and authors will populate once this paper is indexed.
This paper is rendered from ar5iv. Reproductions and verdicts are not yet available — but you can leave a comment below.
[2604.16325] UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

Xingsheng Chen, Xianpei Mu, Deyu Yi, Yilin Yuan, Xingwei He, Bo Gao, Regina Zhang, Pietro Lio, Siu-Ming Yiu Affiliation: School of Computing and Data Science, The University of Hong Kong, Hong Kong, China Affiliation: School of Computing and Data Science, The University of Hong Kong, Hong Kong, China Affiliation: School of Computing and Data Science, The University of Hong Kong, Hong Kong, China Affiliation: School of Information Engineering, Beijing Institute of Graphic Communication, Beijing, China Affiliation: School of Information Engineering, Beijing Institute of Graphic Communication, Beijing, China Affiliation: School of Information Engineering, Beijing Institute of Graphic Communication, Beijing, China Affiliation: Innovation Engineering College, Macau University of Science and Technology, Macau, China Affiliation: Department of Computer Science and Technology, University of Cambridge, Cambridge, UK Affiliation: Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Abstract

Multivariate time series forecasting is fundamental to numerous domains such as energy, finance, and environmental monitoring, where complex temporal dependencies and cross-variable interactions pose enduring challenges. Existing Transformer-based methods capture temporal correlations through attention mechanisms but suffer from quadratic computational cost, while state-space models like Mamba achieve efficient long-context modeling yet lack explicit temporal pattern recognition. Therefore we introduce UniMamba, a unified spatial-temporal forecasting framework that integrates efficient state-space dynamics with attention-based dependency learning. UniMamba employs a Mamba Variate–Channel Encoding Layer enhanced with FFT-Laplace Transform and TCN to capture global temporal dependencies, and a Spatial Temporal Attention Layer to jointly model inter-variate correlations and temporal evolution. A Feedforward Temporal Dynamics Layer further fuses continuous and discrete contexts for accurate forecasting. Comprehensive experiments on eight public benchmark datasets demonstrate that UniMamba consistently outperforms state-of-the-art forecasting models in both forecasting accuracy and computational efficiency, establishing a scalable and robust solution for long-sequence multivariate time-series prediction. The code is available at https://github.com/XsChen524/unimamba-ts

I Introduction

Time-series forecasting plays a critical role in a wide range of real-world applications, including energy demand prediction [1], financial forecasting [2], traffic management [3, 4], and environmental monitoring [5]. The task remains inherently challenging due to complex temporal dependencies [6], multi-variate correlations [7], and non-stationary dynamics [8] commonly observed in practical data. As forecasting horizons grow longer and data modalities become more heterogeneous, building models that can efficiently capture both global temporal patterns and cross-variate relationships has become increasingly vital.

Early approaches relied on statistical and recurrent models such as ARIMA [9], LSTM [10], and GRU [11], which primarily captured short-term dependencies but struggled with scalability and stability over extended sequences. The advent of the Transformer [12] introduced a new era for sequence modeling by leveraging self-attention to capture long-range interactions. Transformer-based forecasting frameworks [13, 14, 15] have demonstrated impressive accuracy; however, their quadratic time complexity and high memory cost severely limit scalability for long and high-dimensional time series. Attempts to linearize attention [16, 17] mitigate efficiency issues, but often lead to degraded temporal precision and reduced robustness.

Recently, state-space models (SSMs) [18] have reemerged as an efficient and theoretically grounded alternative. The Mamba architecture [19] and its variants [20] introduce selective scanning mechanisms and continuous-time dynamics, offering superior scalability in handling long sequences. Despite their success, existing Mamba-based frameworks [19, 21, 22] mainly focus on one-dimensional temporal dynamics and overlook explicit spatial or cross-variate dependencies [7, 23], which are essential in multivariate forecasting [24].

To bridge these two paradigms, we propose UniMamba, a unified spatial-temporal forecasting framework that combines frequency analysis and dependencies capture in bidirectional Mamba variant with expressive power of attention mechanisms. UniMamba consists of a Mamba Variate–Channel (VC) Encoding Layer enhanced with Fast Fourier Transform(FFT)-Laplace reconstruction and Temporal Convolution Networks (TCN) for dynamic feature propagation, followed by a spatial temporal attention Layer to jointly model inter-variates and temporal dependencies. A Feedforward Temporal Dynamics (FFN TD) Layer adaptively fuses global and local temporal contexts before projection, enabling robust long-horizon prediction.

Our primary contributions can be summarized as follows:

  • We propose UniMamba, the first unified framework that combines state-space dynamics with spatial temporal attention, effectively leveraging the strengths of both Transformers and Mamba architectures.

  • We design an enhanced Mamba Variate–Channel Encoding Layer incorporating FFT-Laplace transform and TCN modules, enabling efficient modeling of complex interdependencies across time and variables.

  • We demonstrate that UniMamba achieves state-of-the-art forecasting performance across multiple public datasets, outperforming Transformer, MLP and Mamba-based baselines in terms of accuracy, scalability, and robustness.

By coupling attention-driven spatial temporal modeling with efficient state space recurrent dynamics, UniMamba provides a powerful yet scalable solution for long-horizon multivariate time-series forecasting for production.

II Related Work

II-A Transformer-Based Spatial Temporal Forecasting

Transformer-based architectures [12] have substantially influenced time-series forecasting by enabling long-range temporal dependency modeling through self-attention. Subsequent works have refined this paradigm for temporal tasks, such as employing causal masking [13, 25, 26] to maintain sequence order or using multi-scale hierarchical designs like Pyraformer [14]. Moreover, cross-variate modules such as Crossformer [15] enhance inter-variable interactions through multidimensional attention. Despite these improvements, Transformer variants still suffer from quadratic computational cost 𝒪(L2)\mathcal{O}(L^{2}) and substantial memory overhead, which restrict scalability for long or multivariate sequences. To alleviate this, recent studies explore linearized [16, 17] or patch-based [27] mechanisms, but these often limit global context modeling.

II-B State-Space and Mamba-Based Models

State space models (SSMs) have recently offered an efficient alternative for long-horizon sequence modeling. The Mamba SSM [19] introduces a selective scanning mechanism to capture temporal dynamics in linear time, significantly improving training efficiency and scalability over attention-based models. Variants such as S-Mamba [21] fuses bidirectional Mamba branches that processing forward and flipped sequences to strengthen global pattern recognition. However, existing Mamba architectures primarily focus on propagating or discarding information in sequences and maintaining hidden states precisely, offering limited capacity to decompose and reconstruct temporal signals. These are critical gaps for real-world multivariate forecasting tasks.

II-C Our Contributions: The UniMamba Framework

To overcome these limitations, we propose UniMamba, a unified hybrid framework that seamlessly integrates enhanced Mamba Variate–Channel encoding and Spatial Temporal Attention. UniMamba consist of following components to model temporal signals, capture dependencies and generate precies forecasting results. 1. Enhanced Mamba Variate–Channel Encoding: We incorporate Mamba-based selective scanning with FFT and learnable Laplace inverse transform for temporal signal reconstruction and TCN for local smoothing to build a more expressive and stable sequence representation. 2. Spatial Temporal Attention: Unlike conventional attention models that treat variables and time independently, UniMamba introduces a joint spatial temporal attention layer to strengthen its recognition and expressiveness across both dimensions. 3. Temporal–Dynamic Feedforward Layer: The FFN TD Layer further refines forecasting results by adaptively normalization and feed forwarding tensors before final projection.

Through this unified design, UniMamba combines advantages of both attention and state-space architectures, offering a highly efficient, scalable, and robust framework for multivariate time-series forecasting.

Refer to caption
Figure 1: Framework of UniMamba

III Methodology

III-A Problem Formulation

Let 𝐗=[𝐱1,𝐱2,,𝐱T]T×D\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{T}]\in\mathbb{R}^{T\times D} denote a multivariate time series with TT time steps and DD variables. Each vector 𝐱t=[xt(1),xt(2),,xt(D)]\mathbf{x}_{t}=[x_{t}^{(1)},x_{t}^{(2)},\ldots,x_{t}^{(D)}] represents the observations of DD correlated signals at time step tt. Given a historical observation window of length LL, the forecasting objective is to predict the next HH time steps:

𝐘^=[𝐱^T+1,𝐱^T+2,,𝐱^T+H]=fθ(𝐗1:L)\hat{\mathbf{Y}}=[\hat{\mathbf{x}}_{T+1},\hat{\mathbf{x}}_{T+2},\ldots,\hat{\mathbf{x}}_{T+H}]=f_{\theta}(\mathbf{X}_{1:L}) (1)

where fθ()f_{\theta}(\cdot) represents a learnable forecasting model parameterized by θ\theta, and 𝐘^\hat{\mathbf{Y}} is the model’s predicted future sequence.

In multivariate forecasting, two major challenges must be addressed: (i) capturing long-range temporal dependencies that span wide contextual horizons, and (ii) modeling spatial (cross-variate) relationships among multiple correlated variables. Conventional recurrent or attention-based models often treat these dependencies separately or require quadratic computation, leading to inefficiency and limited scalability, especially when LL and DD become large.

III-B Overall Architecture

UniMamba introduces a unified architecture for efficient and robust multivariate time-series forecasting. As illustrated in Figure 1, input batch is normalized and got its variate and time sequence dimension transposed before embedding, because tokenization along sequence dimension improves model’s understanding quality [7]. Embedded input sequence are fed into bidirectional branches. The embedded sequence in forward branch are transformed to frequency domain via FFT to filter out aperiodic signals and disturbance. Subsequently temporal signal is reconstructed by learnable Laplace transform to represent exponential trends, transients and periodicity across variate channels. TCNs then extract localized and medium-range dependencies, after which tensors are fed to MLP to match dimensions for fusion. While the reserve Mamba branch encodes long-term temporal evolution in backward direction to capture dependencies in future time steps which are neglected by causal convolution. The spatial temporal attention layer adaptively adjusts weight matrices to capture variates and temporal sequence interactions across time. Finally FFN TD block refines latent embeddings before the tensor being projected and de-normalized to output domain.

H=(Tokenize(Transpose(𝐗1:L)))𝐘^=Proj(FFN-TD(ST-Attn(FW(H)+BW(H))))\begin{gathered}H=\big(\text{Tokenize}(\text{Transpose}(\mathbf{X}_{1:L}))\big)\\ \hat{\mathbf{Y}}=\text{Proj}\big(\text{FFN-TD}(\text{ST-Attn}(FW(H)+BW(H)))\big)\end{gathered} (2)

Formally, the overall forecasting pipeline of UniMamba is expressed as above. This unified formulation enables UniMamba to jointly reconstruct trend and seasonal patterns across temporal signal channels in training, inspect dependencies of various scales, and model spatial and temporal correlations while maintaining computational efficiency and generality in real world scenarios.

III-C Learnable Frequency Domain Encoding

To capture both transients and long-range seasonal patterns, UniMamba employs an FFT and learnable Laplace reconstruction as shown in Algorithm 1. The model performs FFT on DD-length sequence in each channel and transform signals to frequency domain through Hv,k=dxv,de2πikdD,d,k{1,2,,D}H_{v,k}=\sum_{d}x_{v,d}e^{\frac{-2\pi ikd}{D}},d,k\in\{1,2,\dots,D\}, where vv and dd are indices of channels and samples in temporal signals. Hv,kH_{v,k} is kk-th frequency-domain components in vv-th channel. To capture long-term seasonal patterns in temporal signals and properly handle negative impacts of transient patterns, the model adopts Laplace signal reconstruction implemented with neural network components:

𝐘^=Laplace(H)=v=1VAeαtcos(ωt+ϕ)\hat{\mathbf{Y}}=\text{Laplace}(H)=\sum_{v=1}^{V}A\cdot e^{\alpha t}\cdot\cos(\omega t+\phi) (3)

where AA, α\alpha, ω\omega, and ϕ\phi are learnable reconstruction parameters trained with input tensors. eαte^{\alpha t} term contributes to modeling transient dynamics and cos(ωt+ϕ)\cos(\omega t+\phi) recovers complex synthesis of periodic signals. To a satisfactory confidence, temporal signals with instantaneous and periodic patterns are reconstructed across given contextual ranges. such signal reconstruction enhances the representation of non-stationary dynamics in discrete temporal sequence. Neural operators are computed from projections during training process with optional low-rank approximation to adjust parameter size level.

Algorithm 1 Learnable FFT and Laplace Reconstruction
0:Input XB×V×DX\in\mathbb{R}^{B\times V\times D}
HFFT(X)H\leftarrow\text{FFT}(X), XrealRe(H)X_{real}\leftarrow\text{Re}(H), XimagIm(H)X_{imag}\leftarrow\text{Im}(H)
if Low Rank then
  A=UA[B,V,P,R]×VATProjectorA(X)A=U_{A}\in[B,V,P,R]\times V_{A}^{\textsf{T}}\leftarrow\text{Projector}_{A}(X)
else
  A[B,V,P,P]ProjectorA(X)A\in[B,V,P,P]\leftarrow\text{Projector}_{A}(X)
end if
αProjectorα(Xreal)\alpha\leftarrow\text{Projector}_{\alpha}(X_{real}), ω,ϕProjectorω,ϕ(Ximag)\omega,\phi\leftarrow\text{Projector}_{\omega,\phi}(X_{imag})
t:[T]Projectort(linspace(0.0001,1,T))t:[T]\leftarrow\text{Projector}_{t}(\text{linspace}(0.0001,1,T))
αELU(α)\alpha\leftarrow-\text{ELU}(-\alpha) [Ensure decay]
Y^:[B,V,P]hAexp(αt)cos(ωt+ϕ)\hat{Y}:[B,V,P]\leftarrow\sum_{h}A\cdot\exp(\alpha\cdot t)\cdot\cos(\omega\cdot t+\phi)
Output Y^B×V×P\hat{Y}\in\mathbb{R}^{B\times V\times P}

III-D Temporal Convolutional Networks

The TCN layer captures local continuity and mid-term dependencies through dilated causal convolutions. The output of ll-th TCN layer is as follow, where WlW^{l} is ll-th conv kernel, d=2l1d=2^{\,l-1} is dilation rate, and σ\sigma is activation function:

H(l)=σ(W(l)d=2l1H(l1)),H(0)B×V×PH^{(l)}=\sigma\bigl(W^{(l)}\ast_{d=2^{\,l-1}}H^{(l-1)}\bigr),H^{(0)}\in\mathbb{R}^{B\times V\times P} (4)

TCN layers process reconstructed temporal signals, expanding receptive fields for capturing short and mid-term dependencies and stabilize gradient propagation during long-sequence forecasting, without significantly increasing the model parameter size.

III-E Mamba State–Space Module (SSM) and Fusion

On the reverse branch, Mamba block scans flipped-over series and models temporal evolution backwards to comprehensively consider future information neglected by causal convolutions in forward branch. As in Algorithm 2, the input sequence is flipped to obtain backward flow, ensuring global temporal information is considered with selective scan mechanism of linear complexity. After flipped back, the results of flipped Mamba is linearly fused with forward branch and residual block, providing a complete temporal context: Hl=Hl1+HFW+HBW(Mamba)H_{l}=H_{l-1}+H_{FW}+H_{BW(Mamba)}. This architecture combines the recurrence efficiency of state-space models with high quality signal reconstructions and broad receptive fields.

III-F Spatial Temporal Attention Integration

The Spatial Temporal Attention module adaptively captures variable-wise and temporal correlations by computing attention scores across both dimensions. As shown below, HB×V×DH\in\mathbb{R}^{B\times V\times D} is fused input, WW is weight matrix, VV is contextual vector, and α\alpha is attention score matrix:

Et=ReLU(WtH+bt),Es=ReLU(WsHT+bs)α=exp(VTE)j=1Nexp(VTE[:,j])\begin{gathered}E_{t}=\text{ReLU}(W_{t}H+b_{t}),E_{s}=\text{ReLU}(W_{s}H^{T}+b_{s})\\ \alpha=\frac{\exp(V^{T}E)}{\sum_{j=1}^{N}\exp(V^{T}E[:,j])}\end{gathered} (5)

Thus the reweighed output is Hl=12(Ht+Hs)H_{l}=\frac{1}{2}(H_{t}+H_{s}). It adjusts contextual embeddings dynamically, allowing the model to prioritize information-rich segments and mitigate noise or redundancy in multivariate inputs.

III-G Feedforward Temporal Dynamics and Projection

A Feedforward Temporal Dynamics (FFN–TD) layer refines the fused representations by modeling residual dependencies and enhancing temporal smoothness. After passing all encoders, the final Projection, Transpose and DeNorm layers project and reshape tensor of features into the target dimensionality, generating the forecasting output 𝐘^\hat{\mathbf{Y}}.

III-H Model Complexity Analysis

UniMamba’s time complexity originates from projector’s matrix computation and selective scan in two branches, and spatial temporal attention module. When XT×DX\in\mathbb{R}^{T\times D} is fed into DD-dimension model with low-rank rr for forecasting PP steps, the complexity is 𝒪(NP2)\mathcal{O}(N\cdot P^{2}) (or 𝒪(NPr),rP\mathcal{O}(N\cdot P\cdot r),r\ll P with approximation) for signal reconstruction, O(NP)O(N\cdot P) for TCNs, and O(ND)O(N\cdot D) for Mamba block. Spatial temporal attention requires O(N,D,E)O(N,D,E) time where EE is the attention dimension. Unlike Transformer-based models with self attention, UniMamba projects LL into model dimension and has its complexity dominated by the controllable parameter DD, thereby avoiding the quadratic dependency 𝒪(L2)\mathcal{O}(L^{2}). The low-rank rr also helps maintain linear complexity given increasing output length PP in long-term forecasting task. All above makes UniMamba particularly well-suited for long-term forecasting and real-time inference on multivariate data streams.

Algorithm 2 Forecast with UniMamba Encoders
0:Input Batch XB×L×VX\in\mathbb{R}^{B\times L\times V}
Xin:[B,V,L]Transpose(Normalize(X))X_{in}^{\top}:[B,V,L]\leftarrow\text{Transpose}(\text{Normalize}(X))
H0:[B,V,D]Embedding(Time Feature(Xin))H_{0}:[B,V,D]\leftarrow\text{Embedding}(\text{Time Feature}(X_{in}^{\top}))
for l=1l=1 to EE do
  Forward: Hfft:[B,V,L]FFT-Laplace(Hl1)H_{fft}:[B,V,L]\leftarrow\text{FFT-Laplace}(H_{l-1})
  for k=1k=1 to KK do
   Hfft:[B,V,L]TCNk(Hfft)H_{fft}:[B,V,L]\leftarrow\text{TCN}_{k}(H_{fft})
  end for
  HFW:[B,V,D]Linear(Hfft)H_{FW}:[B,V,D]\leftarrow\text{Linear}(H_{fft})
  Backward: Hflip:[B,V,D]flip(Hl1)H_{flip}:[B,V,D]\leftarrow\text{flip}(H_{l-1})
  HMamba:[B,V,D]MambaSSM((Hflip))H_{Mamba}:[B,V,D]\leftarrow\text{MambaSSM}((H_{flip}))
  HBW:[B,V,D]flip(HMamba)H_{BW}:[B,V,D]\leftarrow\text{flip}(H_{Mamba})
  Fusion: Hl:[B,V,D]Hl1+HFW+HBWH_{l}:[B,V,D]\leftarrow H_{l-1}+H_{FW}+H_{BW}
  Hl:[B,V,D]Spatial Temporal Attn(Hl1)H_{l}:[B,V,D]\leftarrow\text{Spatial Temporal Attn}(H_{l-1})
  Hl:[B,V,D]Norm(Hl1+FFN(Norm(Hl1)))H_{l}:[B,V,D]\leftarrow\text{Norm}\big(H_{l-1}+\text{FFN}(\text{Norm}(H_{l-1}))\big)
end for
Y^:[B,P,V]DeNorm(Linear(HE)[:,:V,:])\hat{Y}:[B,P,V]\leftarrow\text{DeNorm}(\text{Linear}(H_{E})[:,:V,:]^{\top})
Output Batch Y^B×P×V\hat{Y}\in\mathbb{R}^{B\times P\times V}

IV Experiments

We conduct experiments to answer the following research questions: 1) How does UniMamba matches or outperforms current outstanding baselines methods in terms of overall performance. 2) What role does each individual component of UniMamba play in contributing to overall effectiveness. 3) In terms of computational efficiency, how does UniMamba compare with leading forecasting models. 4) To what extent UniMamba maintains robustness under noise insertion. 5) How does altering the lookback window length influence UniMamba’s long-range forecasting accuracy relative to other models? 6) How effectively can UniMamba identify and represent transient behaviors and temporal structures, including challenging or edge-case scenarios and known limitations?

IV-A Experimental Setup

Datasets: We evaluate our approach using eight publicly available benchmark datasets, namely ETTh1, ETTh2, ETTm1, ETTm2, Exchange, Weather, Solar-Energy, and PEMS08, as summarized in Table I. These datasets span multiple application domains and exhibit diverse temporal and structural properties. MSE and MAE are used as evaluation metrics.

Baselines: We compare UniMamba with nine leading forecasting models that collectively represent three primary architectural families: Transformer-based (six methods), MLP-based (two methods), and state-space model (SSM)-based (one method). The Transformer family includes Autoformer [28], which combines time-series decomposition with an autocorrelation mechanism to capture periodic patterns without relying on standard self-attention; FEDformer [17], which replaces standard attention operations with frequency-domain representations to achieve greater efficiency while retaining a broad receptive field; and Crossformer [15], which adopts multi-dimensional attention applied to patched subsequences, enhancing local feature learning though its performance may taper off for very long horizons. DLinear [16] demonstrates that a pair of lightweight linear layers operating on decomposed trend and residual series can rival more intricate attention models across diverse tasks. PatchTST [27] relies on segmented, channel-wise embeddings to extract temporal cues across multiple scales, while iTransformer [7] inverses the standard attention layout to emphasize relationships among variables, though its flat MLP-based tokenization struggles to represent hierarchical time dependencies. The MLP-based group consists of TimesNet [29], which maps one-dimensional sequences into two-dimensional periodic tensors to jointly learn intra- and inter-period patterns. And TiDE [30], which structures stacked fully connected layers in an encoder–decoder arrangement, discarding both attention and recurrence while maintaining strong temporal modeling capacity. Lastly, the SSM-based model S-Mamba [31] utilizes per-variable tokenization combined with bidirectional Mamba modules to represent variable interactions, further strengthened by feed-forward layers that capture temporal transitions.

TABLE I: Overview of 8 publicly time-series datasets.
Dataset Variables Total Time Steps Sampling Interval
ETTh1 7 17,420 1 hour
ETTh2 7 17,420 1 hour
ETTm1 7 69,680 15 minutes
ETTm2 7 69,680 15 minutes
Exchange 8 7,588 1 day
Weather 21 52,696 10 minutes
Solar Energy 137 52,560 1 hour
PEMS08 170 17,856 5 minutes

IV-B Effectiveness

We conduct effectiveness experiments on UniMamba with input sequence L=96L=96 and forecast horizons T{96,192,336,720}T\in\{96,192,336,720\}. Table II presents comparisons across four datasets. UniMamba consistently secures the best or second-best MSE and MAE scores for most forecasting horizons on given datasets, demonstrating superior effectiveness compared to state-of-the-art baselines such as S-Mamba and iTransformer. Comprehensive effectiveness experimental results and optimal hyperparameter settings are listed in appendix.

The superior performance of UniMamba can be traced to its unified spatial temporal modeling design. First, the incorporation of FFT and learnable Laplace reconstruction modules enable accurate signal reconstruction and spectral representation, capturing periodic frequency-domain patterns and divergent trend without huge overheads. Second, TCN stack enhances local contextual modeling by capturing short-term temporal dependencies missed by purely spectral methods. Third, Mamba block models sequential dynamics in backward directions, enabling the framework to learn temporal correlations in future time steps ignored by causal convolutions. Additionally, spatial temporal attention module adaptively integrates spatial correlations and temporal dependencies, allowing the model to focus on dynamic inter-variable relationships even in non-stationary settings. FFN TD and the final projection layers ensure stable feature refinement and precise mapping to prediction space.

These design choices collectively balance global and local temporal dynamics, enhance generalization across diverse domains, without involving enormous computational costs. This explains UniMamba’s excellent and consistent prediction accuracy across different datasets and forecast horizons.

TABLE II: Comparison results between UniMamba and baselines on ETTm2, ETTh2, Weather, PEMS08 datasets in effectiveness experiments. Bold font denotes the best model and underline denotes the second best. All baseline results are obtained from [31].
Models UniMamba S-Mamba iTransformer PatchTST Crossformer TiDE TimesNet DLinear FEDformer Autoformer
Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
ETTm2 96 0.174 0.257 0.179 0.263 0.180 0.264 0.175 0.259 0.287 0.366 0.207 0.305 0.187 0.267 0.193 0.292 0.203 0.287 0.255 0.339
192 0.240 0.302 0.250 0.309 0.250 0.309 0.241 0.302 0.414 0.492 0.290 0.364 0.249 0.309 0.284 0.362 0.269 0.328 0.281 0.340
336 0.304 0.342 0.312 0.349 0.311 0.348 0.305 0.343 0.597 0.542 0.377 0.422 0.321 0.351 0.369 0.427 0.325 0.366 0.339 0.372
720 0.403 0.400 0.411 0.406 0.412 0.407 0.402 0.400 1.730 1.042 0.558 0.524 0.408 0.403 0.554 0.522 0.421 0.415 0.433 0.432
Avg 0.280 0.325 0.288 0.332 0.288 0.332 0.281 0.326 0.757 0.610 0.358 0.404 0.291 0.333 0.350 0.401 0.305 0.349 0.327 0.371
ETTh2 96 0.293 0.343 0.296 0.348 0.297 0.349 0.302 0.348 0.745 0.584 0.400 0.440 0.340 0.374 0.333 0.387 0.358 0.397 0.346 0.388
192 0.371 0.394 0.376 0.396 0.380 0.400 0.388 0.400 0.877 0.656 0.528 0.509 0.402 0.414 0.477 0.476 0.429 0.439 0.456 0.452
336 0.416 0.429 0.424 0.431 0.428 0.432 0.426 0.433 1.043 0.731 0.643 0.571 0.452 0.452 0.594 0.541 0.496 0.487 0.482 0.486
720 0.411 0.434 0.426 0.444 0.427 0.445 0.431 0.446 1.104 0.763 0.874 0.679 0.462 0.468 0.831 0.657 0.463 0.474 0.515 0.511
Avg 0.373 0.400 0.381 0.405 0.383 0.407 0.387 0.407 0.942 0.684 0.611 0.550 0.414 0.427 0.559 0.515 0.437 0.449 0.450 0.459
Weather 96 0.155 0.200 0.169 0.210 0.174 0.214 0.177 0.218 0.158 0.230 0.202 0.261 0.172 0.220 0.196 0.255 0.217 0.296 0.266 0.336
192 0.210 0.251 0.214 0.253 0.221 0.254 0.225 0.259 0.206 0.277 0.242 0.298 0.219 0.261 0.237 0.296 0.276 0.336 0.307 0.367
336 0.268 0.294 0.274 0.296 0.278 0.296 0.278 0.297 0.272 0.335 0.287 0.335 0.280 0.306 0.283 0.335 0.339 0.380 0.359 0.395
720 0.349 0.346 0.353 0.348 0.358 0.347 0.354 0.348 0.398 0.418 0.351 0.386 0.365 0.359 0.345 0.381 0.403 0.428 0.419 0.428
Avg 0.246 0.273 0.253 0.277 0.258 0.278 0.259 0.281 0.259 0.315 0.271 0.320 0.259 0.287 0.265 0.317 0.309 0.360 0.338 0.382
PEMS08 12 0.075 0.176 0.076 0.178 0.079 0.182 0.168 0.232 0.165 0.214 0.227 0.343 0.112 0.212 0.154 0.276 0.173 0.273 0.436 0.485
24 0.102 0.202 0.104 0.209 0.115 0.219 0.224 0.281 0.215 0.260 0.318 0.409 0.141 0.238 0.248 0.353 0.210 0.301 0.467 0.502
48 0.145 0.232 0.167 0.228 0.186 0.235 0.321 0.354 0.315 0.355 0.497 0.510 0.198 0.283 0.440 0.470 0.320 0.394 0.966 0.733
96 0.219 0.277 0.245 0.280 0.221 0.267 0.408 0.417 0.377 0.397 0.721 0.592 0.320 0.351 0.674 0.565 0.442 0.465 1.385 0.915
Avg 0.135 0.222 0.148 0.224 0.150 0.226 0.280 0.321 0.268 0.307 0.441 0.464 0.193 0.271 0.379 0.416 0.286 0.358 0.814 0.659

IV-C Ablation Study

Table III presents the ablation study conducted on ETTm2, Weather, and PEMS08 datasets with consistent hyperparameter settings to investigate the contribution of each key component in the UniMamba framework. The baseline corresponds to the complete UniMamba model, while subsequent variants remove or modify critical modules such as FFT-Laplace reconstruction, TCN, and attention.

From the results, we observe that removing either the signal reconstruction or TCN modules degrades forecasting performance in most of forecasting horizons and datasets. In terms of ETTm2, removing FFT-Laplace increases the average MSE from 0.281 to 0.288, indicating that the frequency-domain analysis and reconstruction provided by the block is essential for capturing periodic and trend patterns. Similarly, eliminating TCN stack causes further degradation (average MSE rising from 0.246 to 0.252 on Weather), confirming the significance of local temporal pattern extraction.

Furthermore, replacing the combination of both signal reconstruction and TCN with MLP leads to huge accuracy drop, highlighting their strong complementarity. Spectral transformations enable long-term dependency modeling while convolutional filters strengthen short-term feature extraction. The attention-related variants also provide valuable insights: disabling Spatial Temporal Attention results in higher error rates, particularly on ETTm2 dataset, underscoring its role in adaptively weighting diverse temporal and spatial interactions. Conversely, self attention provides excellent accuracy in PEMS08 short-term prediction. Such mechanism also improves stability for some longer horizons, proving beneficial in handling non-stationary and multi-variate dependencies with quadratic complexity.

The minimal version that retaining only reverse Mamba block exhibits the worst overall results, validating that UniMamba’s superior performance emerges from the synergy between its signal construction, TCN, and attention mechanisms. Collectively, these findings demonstrate that each component contributes uniquely to UniMamba’s capability in achieving accurate, robust, and efficient spatiotemporal prediction across diverse domains.

TABLE III: Ablation study results for UniMamba across ETTm2, Weather, and PEMS08 datasets. The baseline represents the complete UniMamba architecture, while the others represent ones with critical components replaced or removed.
Model Length ETTm2 Weather PEMS08
MSE MAE MSE MAE MSE MAE
Baseline 96 0.174 0.257 0.157 0.202 0.076 0.176
192 0.241 0.303 0.210 0.250 0.102 0.202
336 0.304 0.342 0.269 0.295 0.145 0.236
720 0.403 0.400 0.350 0.348 0.225 0.278
w/o FFT-Laplace 96 0.177 0.261 0.161 0.205 0.081 0.184
192 0.250 0.311 0.212 0.251 0.115 0.221
336 0.312 0.349 0.273 0.297 0.161 0.251
720 0.413 0.406 0.350 0.347 0.275 0.315
w/o TCN 96 0.179 0.263 0.166 0.209 0.085 0.189
192 0.244 0.305 0.215 0.254 0.124 0.228
336 0.302 0.341 0.273 0.294 0.140 0.245
720 0.405 0.400 0.353 0.348 0.221 0.304
w/o FFT-Laplace & TCN 96 0.182 0.267 0.165 0.208 0.089 0.295
192 0.251 0.312 0.215 0.253 0.138 0.244
336 0.312 0.349 0.275 0.297 0.169 0.272
720 0.413 0.404 0.352 0.348 0.281 0.346
Self Attention 96 0.179 0.262 0.160 0.207 0.074 0.173
192 0.245 0.305 0.212 0.253 0.101 0.199
336 0.310 0.345 0.270 0.297 1.070 0.804
720 0.430 0.414 0.350 0.345 0.234 0.281
w/o Attention 96 0.177 0.260 0.155 0.200 0.076 0.176
192 0.246 0.306 0.209 0.251 0.104 0.204
336 0.305 0.343 0.271 0.297 0.145 0.235
720 0.428 0.411 0.350 0.346 0.242 0.287
Minimal 96 0.184 0.266 0.166 0.209 0.089 0.195
192 0.250 0.311 0.216 0.254 0.141 0.247
336 0.311 0.349 0.273 0.295 0.195 0.287
720 0.413 0.405 0.353 0.348 0.342 0.383

IV-D Efficiency

Table IV compares the efficiency of UniMamba with firstclass baselines on ETTm1 and Weather datasets in terms of prediction accuracy and training time. UniMamba achieves the lowest or tied-lowest MSE and MAE while maintaining competitive training efficiency. On ETTm1, it matches PatchTST’s accuracy but reduces training time by 72%. UniMamba surpasses given models with significant gains in precision on Weather and keep the training cost at an competitive low level. These results highlight UniMamba’s balanced trade-off between accuracy and computational cost, achieved through its lightweight state-space formulation, parallelizable frequency-domain modeling and transformation, and spatial temporal attention integration. In summary, UniMamba demonstrates high effectiveness and scalability for real-time forecasting tasks.

TABLE IV: Efficiency comparison between UniMamba and other models, in terms of prediction errors and training time.
ETTm1
Models UniMamba S-Mamba iTrans PatchTST AutoF
MSE 0.324 0.341 0.342 0.324 0.526
MAE 0.362 0.371 0.377 0.362 0.488
Training Time 39.63(ms/it) 25.02 14.54 141.12 45.79
Change (%) - -36.9% -63.3% +256.1% +15.5%
Weather
Models UniMamba S-Mamba iTrans PatchTST AutoF
MSE 0.156 0.168 0.176 0.183 0.323
MAE 0.200 0.211 0.215 0.222 0.373
Training Time 41.49(ms/it) 30.36 20.68 158.87 46.40
Change (%) - -26.8% -50.2% +282.9% +11.8%

IV-E Robustness

Figure 2 shows the robustness analysis of UniMamba on ETTm2 under increasing noise perturbations with forecast lengths L{96,192,336,720}L\in\{96,192,336,720\}. The figure reports MSE and MAE values with their percentage changes as standard deviation of gaussian noise grows.

Across all horizons, UniMamba remains highly stable. When standard deviation of gaussian noise increases to 0.3, MSE increases by less than 10%, indicating model’s strong resistance to input distortion. Even at 0.5 noise’s standard deviation, the error growth is moderate and does not exceed 20% in long-term predictions. This robustness stems from the combined effect of Mamba SSM capturing global temporal patterns, Laplace Transform, which enhances frequency-domain stability and realistic signal reconstruction, and spatial temporal attention that adaptively adjusts feature weighting under noisy conditions.

Refer to caption
Figure 2: Robustness experiments on ETTm2

IV-F Lookback Length Study

Figure 3 presents the effect of varying lookback window lengths on forecasting accuracy across ETTm1, Weather, and PEMS08. It can be observed that UniMamba attains the lowest MSE values under short and medium-length lookback configurations, reflecting its remarkable ability to adapt to both short-term and long-horizon temporal dependencies. In contrast to Transformer-based baselines, whose performance often exhibits substantial fluctuations or degradation as the lookback length increases, UniMamba maintains steady or even enhanced predictive accuracy, underscoring its capacity for efficient long-range dependency modeling. On ETTm1 and Weather, accuracy improvements tend to converge beyond 192 lookback length, implying that trends and periodic components are effectively captured within this range. On PEMS08, UniMamba continues to benefit from extended historical contexts, highlighting its pattern capture and signal reconstruction capabilities with high-dimensional series. It’s worth noting that MSE increases when lookback length reaches 720, possibly caused by over-comprehensive consideration of all patterns in the lookback sequence and introducing unnecessary information. Overall, these results confirm that UniMamba achieves an optimal utilization of given contextual series by effectively extracting and analyzing temporal patterns inside.

Refer to caption
Figure 3: Prediction error values of UniMamba and baseline models with increasing lookback length.
Refer to caption
Figure 4: Case Study on ETTm2

IV-G Case Study

ETTm2 is a dataset of electricity transformer temperatures. It contains periodic patterns of transformer and oil temperatures in varied time cycles, reflecting real industrial conditions. Figure 4 presents prediction comparisons among UniMamba, S-Mamba, iTransformer, PatchTST and ground truth. Subfigures represent temperature curves of HUFL and LUFL nodes across days, generated with varied predicted lengths. It shows that UniMamba closely follows the ground truth, maintaining smooth and accurate trends even for long prediction horizons. Competing models exhibit noticeable deviations and misjudgment of trends, particularly in regions with rapid fluctuations. The strong alignment of UniMamba with ground truth shows its capability of accurate prediction and mutation identification. This stability attributes to functionalities provided by different components in the unified framework. Overall, UniMamba delivers faithful and reliable reconstruction of temporal patterns, demonstrating its potentiality and generality in real industrial production environments.

V Conclusion

This paper presents UniMamba, a unified spatial temporal forecasting framework that combines frequency-domain analysis and reconstruction, temporal convolution, state space modeling, and attention-based fusion in a single architecture. Through the integration of FFT and Laplace reconstruction, reverse Mamba, TCN, and Spatial Temporal Attention mechanism, UniMamba can effectively capture variate and temporal dependencies and forecast time series accurately. Experimental results show that UniMamba matches or surpasses best forecasting models in accuracy and training efficiency, maintaining stability even under noise and varying input conditions. Its harmonious modular design enables deployment in large-scale or real-time time series forecasting scenarios.

In future work, we plan to extend UniMamba to handle irregularly sampled signals and multimodal temporal data while exploring adaptive mechanisms for continual training. These enhancements aim to further strengthen UniMamba’s practicality and generalization in complex, dynamic spatial temporal environments.

References

  • [1] K. Muralitharan, R. Sakthivel, and R. Vishnuvarthan, “Neural network based optimization approach for energy demand prediction in smart grid,” Neurocomputing, vol. 273, pp. 199–208, 2018.
  • [2] F. Z. Xing, E. Cambria, and R. E. Welsch, “Natural language based financial forecasting: a survey,” Artificial Intelligence Review, vol. 50, no. 1, pp. 49–73, 2018.
  • [3] Q. Zhang, X. Gao, H. Wang, S. M. Yiu, and H. Yin, “Efficient traffic prediction through spatio-temporal distillation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, 2025, pp. 1093–1101.
  • [4] Q. Zhang, H. Wang, C. Long, L. Su, X. He, J. Chang, T. Wu, H. Yin, S.-M. Yiu, Q. Tian et al., “A survey of generative techniques for spatial-temporal data mining,” arXiv preprint arXiv:2405.09592, 2024.
  • [5] A. Kumar, H. Kim, and G. P. Hancke, “Environmental monitoring systems: A review,” IEEE Sensors Journal, vol. 13, no. 4, pp. 1329–1339, 2012.
  • [6] X. Chen, R. Zhang, B. Gao, X. He, X. Liu, P. Lio, K.-Y. Lam, and S.-M. Yiu, “Mode: Efficient time series prediction with mamba enhanced by low-rank neural odes,” 2026. [Online]. Available: https://arxiv.org/abs/2601.00920
  • [7] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itransformer: Inverted transformers are effective for time series forecasting,” arXiv preprint arXiv:2310.06625, 2023.
  • [8] Q. Zhang, H. Wen, M. Li, D. Huang, S.-M. Yiu, C. S. Jensen, and P. Liò, “Autohformer: Efficient hierarchical autoregressive transformer for time series prediction,” arXiv preprint arXiv:2506.16001, 2025.
  • [9] R. H. Shumway and D. S. Stoffer, “Arima models,” in Time series analysis and its applications: with R examples. Springer, 2017, pp. 75–163.
  • [10] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2016.
  • [11] R. Dey and F. M. Salem, “Gate-variants of gated recurrent unit (gru) neural networks,” in 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, 2017, pp. 1597–1600.
  • [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [13] B. Lim and S. Zohren, “Time-series forecasting with deep learning: a survey,” Philosophical Transactions of the Royal Society A, vol. 379, no. 2194, p. 20200209, 2021.
  • [14] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dustdar, “Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting,” in International conference on learning representations, 2021.
  • [15] Y. Zhang and J. Yan, “Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting,” in The eleventh international conference on learning representations, 2022.
  • [16] A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 11 121–11 128.
  • [17] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting,” in International conference on machine learning. PMLR, 2022, pp. 27 268–27 286.
  • [18] W. Merrill, J. Petty, and A. Sabharwal, “The illusion of state in state-space models,” arXiv preprint arXiv:2404.08819, 2024.
  • [19] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
  • [20] Q. Zhang, C. Yu, H. Wang, Y. Yan, Y. Cao, S.-M. Yiu, T. Wu, and H. Yin, “Fldmamba: Integrating fourier and laplace transform decomposition with mamba for enhanced time series prediction,” arXiv preprint arXiv:2507.12803, 2025.
  • [21] Z. Wang, F. Kong, S. Feng, M. Wang, H. Zhao, D. Wang, and Y. Zhang, “Is mamba effective for time series forecasting?” arXiv preprint arXiv:2403.11144, 2024.
  • [22] A. Liang, X. Jiang, Y. Sun, and C. Lu, “Bi-mamba4ts: Bidirectional mamba for time series forecasting,” arXiv preprint arXiv:2404.15772, 2024.
  • [23] Z. Li, S. Qi, Y. Li, and Z. Xu, “Revisiting long-term time series forecasting: An investigation on linear mapping,” arXiv preprint arXiv:2305.10721, 2023.
  • [24] B. N. Patro and V. S. Agneeswaran, “Simba: Simplified mamba-based architecture for vision and multivariate time series,” arXiv preprint arXiv:2403.15360, 2024.
  • [25] J. F. Torres, D. Hadjout, A. Sebaa, F. Martínez-Álvarez, and A. Troncoso, “Deep learning for time series forecasting: a survey,” Big Data, vol. 9, no. 1, pp. 3–21, 2021.
  • [26] Y. Zheng, P. Wei, Z. Chen, Y. Cao, and L. Lin, “Graph-convolved factorization machines for personalized recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 2, pp. 1567–1580, 2021.
  • [27] X. Huang, J. Tang, and Y. Shen, “Long time series of ocean wave prediction based on patchtst model,” Ocean Engineering, vol. 301, p. 117572, 2024.
  • [28] H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in neural information processing systems, vol. 34, pp. 22 419–22 430, 2021.
  • [29] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” in The eleventh international conference on learning representations, 2022.
  • [30] A. Das, W. Kong, A. Leach, S. Mathur, R. Sen, and R. Yu, “Long-term forecasting with tide: Time-series dense encoder,” arXiv preprint arXiv:2304.08424, 2023.
  • [31] Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang, and Y. Zhang, “Is mamba effective for time series forecasting?” Neurocomputing, vol. 619, p. 129178, 2025.

Comments

· 0
Be the first to comment on this paper.