arXiv:2605.04793 · cs.LG · uncurated · rendered via ar5iv

Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Title and authors will populate once this paper is indexed.
This paper is rendered from ar5iv. Reproductions and verdicts are not yet available — but you can leave a comment below.
[2605.04793] Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Matan Pagi    Zohar Sorek Affiliation: Psistar AI Affiliation: {matan,zohar}@psistar.ai
Abstract

Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution.

We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than 1%1\% parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone- descent and KKT convergence results under standard trust-region assumptions.

Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provides a simple and effective mechanism for robust MPC under varying conditions.

1 Introduction

The Koopman operator offers an elegant reformulation of nonlinear dynamics: any nonlinear system, however complex in state space, admits a linear representation in an infinite-dimensional function space (Koopman, 1931; Mezić, 2005). Finite-dimensional approximations learned from trajectory data have enabled controllers that combine the tractability of linear MPC with operation on genuinely nonlinear physical systems (Korda and Mezić, 2018; Lusch et al., 2018; Williams et al., 2015). For real-time industrial control-chemical reactors, compressors, power converters-tractable convex optimization at each control step is a hard operational requirement.

Recent work has extended Koopman learning to time-varying systems by replacing fixed operator matrices with sequences generated from historical data. A representative instance of this class is the Mamba-based Koopman operator (MamKO) (Li et al., 2025), which processes recent state and control history through a convolutional-FCNN network to emit the matrices A¯k\bar{A}_{k}, B¯k\bar{B}_{k}, C¯k\bar{C}_{k} of a local linear system at each step, substantially outperforming static Koopman models while retaining the convex MPC formulation.

The structural constraint.

This class of methods rests on a deliberate architectural choice: the drift operator A¯k\bar{A}_{k} is generated from historical data Xhist={zkH:k1,ukH:k1}X_{\mathrm{hist}}=\{z_{k-H:k-1},u_{k-H:k-1}\} but is explicitly not a function of the current control input uku_{k}. Making A¯k\bar{A}_{k} depend on uku_{k} would introduce a bilinear term A¯k(uk)zk\bar{A}_{k}(u_{k})\,z_{k}, breaking MPC convexity. This constraint is acknowledged in the MamKO formulation (Li et al., 2025) and is forced more generally by any control architecture that preserves single-shot QP convexity. The consequence is conditional independence A¯kukXhist\bar{A}_{k}\perp\!\!\!\perp u_{k}\mid X_{\mathrm{hist}}. This constraint limits the model’s expressive power along two distinct axes.

Control–state coupling. When the governing physics contains uxu\cdot x terms-convective fluxes in flow-actuated reactors, multiplicative inputs in mechanical systems, voltage-dependent impedances in power electronics (Mohler, 1973), the mixed partial 2zk+1/zkuk(i)\partial^{2}z_{k+1}/\partial z_{k}\partial u_{k}^{(i)} is structurally nonzero. MamKO’s architecture forces this quantity to zero, requiring its backbone to absorb the coupling indirectly via step-by-step regeneration of A¯k\bar{A}_{k}. Over a multi-step rollout, the resulting approximation error accumulates fastest in regimes where the controller is deliberately exercising large control variation-precisely where MPC matters most.

Time-varying parameters. When the dynamics themselves drift over time—catalyst aging, heat-exchanger fouling, mechanical wear, friction modulation—the effective operator must track changes that the historical encoder may not yet have observed within its lookback window. A bilinear coupling provides an additional, structured channel through which the operator can adapt to the present operating point in a way that the linear lift cannot reach within a single MPC horizon.

Both regimes are pervasive in industrial control: the first is structural, the second operational. A single architectural mechanism that addresses both-without requiring a different model class for each-is the contribution of this paper.

Our contribution.

We propose a minimal, principled extension of the Mamba-Koopman dynamics that relaxes conditional independence while retaining the features that make the model useful for control:

zk+1=(diag(A)+i=1muk(i)Gi)=:Aeff(uk)zk+Bkuk,z_{k+1}=\underbrace{\Bigl(\mathrm{diag}(A)+\sum_{i=1}^{m}u_{k}^{(i)}\,G_{i}\Bigr)}_{=:\,A_{\mathrm{eff}}(u_{k})}z_{k}+B_{k}u_{k}, (1)

where AdzA\in\mathbb{R}^{d_{z}} contains the diagonal continuous-time eigenvalues of the existing Mamba–Koopman backbone (unchanged), Gidz×dzG_{i}\in\mathbb{R}^{d_{z}\times d_{z}} are learned bilinear coupling tensors and Bkdz×mB_{k}\in\mathbb{R}^{d_{z}\times m} is the step-varying actuator matrix from the existing dynamics network. Setting Gi=0G_{i}=0 restores MamKO exactly. The bilinear structure exposes exact analytical model Jacobians, making Sequential Convex Programming (SCP) natural: linearize around a nominal trajectory, solve a convex QP, update, repeat.

Summary of contributions.

  1. 1.

    Bilinear Koopman dynamics (Section 3): a strict generalization of the conditional-independence class modeling control–state coupling, with low-rank parameterization for parameter efficiency and a spectral penalty enforcing stability of the resulting dense operator.

  2. 2.

    SCP controller (Section 4): an iterative MPC algorithm exploiting exact model Jacobians, with monotone-descent and KKT convergence results under standard trust-region assumptions.

  3. 3.

    Empirical evaluation (Section 5) across CartPole and RSCP in time-invariant and time-varying regimes, establishing four consistent effects: forecasting non-inferiority on every cell, training stabilization on RSCP TV, closed-loop MPC wins on RSCP TV when SCP is iterated, and graceful degradation under stale-plan execution.

Scope and limitations.

Our method inherits the conditional-independence baseline’s assumptions: the latent dynamics are approximately linear at each step, and the encoder is expressive enough to find a lifting where this holds. The SCP controller provides local (KKT) rather than global optimality guarantees, and convergence rate depends on initialization quality. We position this work as closing a specific, well-characterized structural gap in the Koopman-for-control literature without claiming a new operator class.

2 Background

2.1 Koopman Operators for Controlled Systems

For a discrete-time controlled system xk+1=f(xk,uk)x_{k+1}=f(x_{k},u_{k}), the Koopman operator (Koopman, 1931) acts on observables gg via composition: 𝒦g=gf\mathcal{K}g=g\circ f. This operator is linear even when ff is not. One seeks a finite-dimensional lifting ψ:ndz\psi:\mathbb{R}^{n}\to\mathbb{R}^{d_{z}} such that the lifted state zk=ψ(xk)z_{k}=\psi(x_{k}) evolves approximately linearly (Korda and Mezić, 2018):

zk+1=Azk+Buk,x^k=Czk.z_{k+1}=Az_{k}+Bu_{k},\quad\hat{x}_{k}=Cz_{k}. (2)

EDMD (Williams et al., 2015) identifies A,B,CA,B,C by least squares over a fixed observable dictionary. Deep Koopman methods replace the fixed dictionary with a learned encoder (Lusch et al., 2018).

2.2 Mamba-Koopman and Conditional Independence

MamKO (Li et al., 2025) addresses the limitation of a fixed operator for time-varying systems. Inspired by Mamba’s selective state-space mechanism (Gu and Dao, 2023), a matrices-generation network takes the historical sequence Xhist=[zkH:k1,ukH:k1]X_{\mathrm{hist}}=[z_{k-H:k-1}^{\top},u_{k-H:k-1}^{\top}]^{\top} and emits time-varying operators via 1D convolution and FCNNs. After ZOH discretization:

zk+1=A¯k(Xhist)zk+B¯k(Xhist)uk,x^k=C¯kzk.z_{k+1}=\bar{A}_{k}(X_{\mathrm{hist}})\,z_{k}+\bar{B}_{k}(X_{\mathrm{hist}})\,u_{k},\quad\hat{x}_{k}=\bar{C}_{k}z_{k}. (3)

Generating all matrices from XhistX_{\mathrm{hist}} alone enforces

A¯kuk|Xhist,\bar{A}_{k}\perp\!\!\!\perp u_{k}\;\Big|\;X_{\mathrm{hist}}, (4)

preserving linearity in uku_{k} and hence MPC convexity, but preventing control-state coupling.

2.3 Bilinear Systems

A discrete-time bilinear system takes the form xk+1=Axk+iuk(i)Nixk+Bukx_{k+1}=Ax_{k}+\sum_{i}u_{k}^{(i)}N_{i}x_{k}+Bu_{k}, where Nin×nN_{i}\in\mathbb{R}^{n\times n} are interaction matrices. Bilinear systems occupy a well-studied middle ground between linear and fully nonlinear: they represent a broad class of physical phenomena including chemical reactions and mechanical systems with multiplicative inputs (Mohler, 1973). The Koopman literature has identified bilinear lifted models and related finite-dimensional approximations as a useful middle ground for control, making this a natural target for Koopman approximation. SCP (Mao et al., 2018) and iterative LQR (Li and Todorov, 2004) handle optimization over bilinear dynamics by exploiting the analytic Jacobian structure.

3 Bilinear Koopman Extension

3.1 Bilinear Latent Dynamics

We relax eq. 4 by parameterizing the latent transition as eq. 1, where AdzA\in\mathbb{R}^{d_{z}}, Gidz×dzG_{i}\in\mathbb{R}^{d_{z}\times d_{z}}, and BkB_{k} are as described in Section 1. The observation map x^k=Czk\hat{x}_{k}=Cz_{k} remains linear. The effective drift operator Aeff(uk)=diag(A)+iuk(i)GiA_{\mathrm{eff}}(u_{k})=\mathrm{diag}(A)+\sum_{i}u_{k}^{(i)}G_{i} is a linear function of uku_{k}, making the full dynamics bilinear in (zk,uk)(z_{k},u_{k}) jointly.

Proposition 1 (Strict Generalization).

The function class of eq. 1 strictly contains that of eq. 3.

Proof.

Setting Gi=0G_{i}=0 for all ii recovers eq. 3 exactly. Conversely, any system with 2zk+1/(zkuk(i))=Gi0\partial^{2}z_{k+1}/(\partial z_{k}\,\partial u_{k}^{(i)})=G_{i}\neq 0 is representable by eq. 1 but not by eq. 3, since eq. 4 forces this cross-derivative to zero. Hence the containment is strict. ∎

Proposition 2 (Population-Level Forecasting Non-Inferiority).

Let MamKOBL\mathcal{F}_{\mathrm{MamKO}}\subsetneq\mathcal{F}_{\mathrm{BL}} denote the hypothesis classes induced by eq. 3 and eq. 1. Then

minfBL𝔼[zk+1f(zk,uk)2]minfMamKO𝔼[zk+1f(zk,uk)2],\min_{f\in\mathcal{F}_{\mathrm{BL}}}\,\mathbb{E}\bigl[\|z_{k+1}-f(z_{k},u_{k})\|^{2}\bigr]\\ \leq\;\min_{f\in\mathcal{F}_{\mathrm{MamKO}}}\,\mathbb{E}\bigl[\|z_{k+1}-f(z_{k},u_{k})\|^{2}\bigr],

with strict inequality whenever the true latent dynamics admit nonzero control–state coupling.

Proof.

The inequality follows from Proposition 1: any optimum in MamKO\mathcal{F}_{\mathrm{MamKO}} is achievable in BL\mathcal{F}_{\mathrm{BL}} by setting Gi=0G_{i}=0. For strictness, suppose the true latent dynamics satisfy zk+1=A(uk)zk+Bukz_{k+1}=A^{\star}(u_{k})\,z_{k}+B^{\star}u_{k} with A/uk(i)=Gi0\partial A^{\star}/\partial u_{k}^{(i)}=G_{i}^{\star}\neq 0. Any fMamKOf\in\mathcal{F}_{\mathrm{MamKO}} incurs irreducible error proportional to Giuku¯\|G_{i}^{\star}\|\cdot\|u_{k}-\bar{u}\|, where u¯\bar{u} is the training-set mean control. This error vanishes only when uku_{k} is constant across the evaluation distribution-vacuous in any MPC setting. ∎

Remark 1.

Empirically the strictness condition partitions the systems we evaluate. On CartPole, where the equations of motion couple horizontal force with pendulum geometry, the bilinear extension produces strict forecasting gains on TI and TV. On RSCP, where heat duties enter additively and the governing ODE contains no uxu\cdot x terms, Gi0\|G_{i}^{\star}\|\approx 0 and the proposition predicts equality—which is what we observe on RSCP TI, and on RSCP TV under deployment-relevant averaging (Table 2). The 23%23\% bilinear-vs-linear gap visible in best-checkpoint MSE on RSCP TV reflects a training-noise asymmetry between the two models, not a violation of the proposition: the Linear baseline oscillates throughout training while bilinear converges smoothly, so single-epoch selection from linear’s trajectory captures one of its downward spikes. Mean-last-5050 MSE—which averages across the spikes—restores the equality the proposition predicts. The learned GF\|G\|_{F} values (Table 9) empirically confirm the partition.

3.2 Low-Rank Parameterization

The tensors {Gi}i=1m\{G_{i}\}_{i=1}^{m} add mdz2m\cdot d_{z}^{2} parameters. For typical settings (m5m\leq 5, dz64d_{z}\leq 64), this is at most 20{\sim}20K parameters, under 1% of a standard Mamba-Koopman backbone. We impose a low-rank prior:

Gi=LiRi,Li,Ridz×r,rdz,G_{i}=L_{i}R_{i}^{\top},\quad L_{i},R_{i}\in\mathbb{R}^{d_{z}\times r},\quad r\ll d_{z}, (5)

reducing the addition to 2mrdz2mrd_{z} parameters and regularizing the coupling strength when r<dzr<d_{z}. Both LiL_{i} and RiR_{i} are initialized to zero, so the model begins as an exact copy of the Linear baseline; any divergence in training is attributable solely to the bilinear terms. The rank rr is a single interpretable hyperparameter: r=1r=1 encodes a rank-11 perturbation per control channel; r=dzr=d_{z} recovers the unconstrained bilinear model. We use r=dzr=d_{z} in all reported runs, as dz{8,15}d_{z}\in\{8,15\} is small enough that further parameter reduction is unnecessary; the low-rank option remains available for deployments with larger latent dimensions.

3.3 Lie–Trotter Split ZOH Discretization

Write the effective generator as

Aeff(uk)\displaystyle A_{\mathrm{eff}}(u_{k}) =D+P(uk),\displaystyle=D+P(u_{k}), (6)
D\displaystyle D =diag(A),\displaystyle=\mathrm{diag}(A),
P(uk)\displaystyle P(u_{k}) =j=1muk(j)Gj.\displaystyle=\sum_{j=1}^{m}u_{k}^{(j)}G_{j}.

Dense ZOH would exponentiate D+P(uk)D+P(u_{k}) at a single scalar period TT, collapsing the baseline MamKO per-mode timescales δn\delta_{n}. This would break the exact zero-coupling reduction to the linear baseline. We therefore use a first-order Lie–Trotter splitting (McLachlan and Quispel, 2002), historically rooted in the Trotter product formula (Trotter, 1959):

exp((D+P(uk))T)\displaystyle\exp\!\left((D+P(u_{k}))T\right) exp(P(uk)T)=:EP(uk)\displaystyle\approx\underbrace{\exp\!\left(P(u_{k})T\right)}_{=:E_{P}(u_{k})} (7)
diag(exp(A𝜹))=:ED.\displaystyle\qquad\;\underbrace{\mathrm{diag}\!\left(\exp(A\odot\boldsymbol{\delta})\right)}_{=:E_{D}}.

With B¯diag,k\bar{B}_{\mathrm{diag},k} the corresponding diagonal ZOH input integral, the implemented one-step map is

FLT(zk,uk):=EP(uk)(EDzk+B¯diag,kuk).F_{\mathrm{LT}}(z_{k},u_{k}):=E_{P}(u_{k})\bigl(E_{D}z_{k}+\bar{B}_{\mathrm{diag},k}u_{k}\bigr). (8)

When P(uk)0P(u_{k})\equiv 0, EP(uk)=IE_{P}(u_{k})=I, so the scheme reduces exactly to the baseline per-mode diagonal ZOH. Implementation details, numerical stabilization, and cost are deferred to Section C.3.

3.4 Stability

The negative_celu activation in the Mamba-Koopman backbone constrains the diagonal entries of AA to be at most 1, guaranteeing ρ(Adisc)1\rho(A_{\mathrm{disc}})\leq 1 for the diagonal case. This guarantee does not extend to the dense Aeff(uk)A_{\mathrm{eff}}(u_{k}): the bilinear terms can shift eigenvalues outside the unit disk. We apply a two-tier approach at training and inference.

Training: spectral penalty.

We add to the loss

stab=jReLU(|λj(Adisc,k)|1+ms),\mathcal{L}_{\mathrm{stab}}=\sum_{j}\mathrm{ReLU}\bigl(|\lambda_{j}(A_{\mathrm{disc},k})|-1+m_{s}\bigr), (9)

with margin ms=0.05m_{s}=0.05, computed via torch.linalg.eigvals in float32 to avoid mixed-precision artifacts. Zero initialization of GiG_{i} ensures the model starts in the stable regime.

Inference.

At inference we evaluate ρ(Adisc,k)\rho(A_{\mathrm{disc},k}) at each MPC step via torch.linalg.eigvals, at O(dz3)O(d_{z}^{3}) cost; for dz64d_{z}\leq 64 this is negligible relative to backbone inference. A discussion of why we did not use a Gershgorin pre-check for screening is deferred to Appendix A.

3.5 Training Objective

We train end-to-end on multi-step observation-space prediction loss:

=1Tk=1Tx^kxk2+λsstab,\mathcal{L}=\frac{1}{T}\sum_{k=1}^{T}\|\hat{x}_{k}-x_{k}\|^{2}+\lambda_{s}\,\mathcal{L}_{\mathrm{stab}}, (10)

with the stability penalty disabled at inference. All baseline training hyperparameters (optimizer, schedule, context window HH) are held fixed; only {Li,Ri}\{L_{i},R_{i}\} and the jointly trained backbone parameters are updated.

4 SCP Controller for Bilinear MPC

4.1 Problem Formulation

The open-loop MPC problem over horizon HH is

minu0:H1\displaystyle\min_{u_{0:H-1}} k=0H1(zk,uk)\displaystyle\sum_{k=0}^{H-1}\ell(z_{k},u_{k}) (11)
s.t. zk+1=FLT(zk,uk),\displaystyle z_{k+1}=F_{\mathrm{LT}}(z_{k},u_{k}),
uk𝒰.\displaystyle u_{k}\in\mathcal{U}.

This problem is non-convex in (zk,uk)(z_{k},u_{k}) jointly. The conditional-independence baseline sidesteps non-convexity by design (conditional independence makes the QP trivially convex in one shot). We instead solve eq. 11 iteratively via SCP, exploiting the exact differentiable structure of the implemented bilinear model.

4.2 Exact Model Jacobians

SCP linearizes the implemented Lie–Trotter map eq. 8 around a nominal trajectory (z¯0:H,u¯0:H)(\bar{z}_{0:H},\bar{u}_{0:H}). Since FLT(zk,uk)F_{\mathrm{LT}}(z_{k},u_{k}) is affine in zkz_{k}, the state Jacobian is

A~k\displaystyle\tilde{A}_{k} =FLTzk|(z¯k,u¯k)=EP(u¯k)ED,\displaystyle=\frac{\partial F_{\mathrm{LT}}}{\partial z_{k}}\Bigg|_{(\bar{z}_{k},\bar{u}_{k})}=E_{P}(\bar{u}_{k})E_{D}, (12)
B~k\displaystyle\tilde{B}_{k} =FLTuk|(z¯k,u¯k).\displaystyle=\frac{\partial F_{\mathrm{LT}}}{\partial u_{k}}\Bigg|_{(\bar{z}_{k},\bar{u}_{k})}. (13)

The input Jacobian B~k\tilde{B}_{k} contains the derivative of the matrix exponential EP(uk)E_{P}(u_{k}) with respect to uku_{k}. In implementation, we compute B~k\tilde{B}_{k} by reverse-mode automatic differentiation of the full Lie–Trotter update using torch.func.jacrev. This yields exact model Jacobians of the implemented differentiable dynamics up to floating-point precision, with no finite-difference linearization inside SCP.

4.3 SCP Algorithm

Each SCP iteration solves the convex subproblem obtained by evaluating the finite-horizon quadratic tracking objective on the linearized trajectory. Denoting this objective by JlinJ_{\mathrm{lin}}, we solve

minδu0:H1\displaystyle\min_{\delta u_{0:H-1}} Jlin(δu;z¯,u¯)\displaystyle J_{\mathrm{lin}}(\delta u;\bar{z},\bar{u}) (14)
s.t. δz0=0,\displaystyle\delta z_{0}=0,
δzk+1=A~kδzk+B~kδuk,\displaystyle\delta z_{k+1}=\tilde{A}_{k}\delta z_{k}+\tilde{B}_{k}\delta u_{k},
u¯k+δuk𝒰,\displaystyle\bar{u}_{k}+\delta u_{k}\in\mathcal{U},
δukε.\displaystyle\|\delta u_{k}\|_{\infty}\leq\varepsilon.

Here JlinJ_{\mathrm{lin}} includes the same stage and terminal terms as the true MPC objective; when a Δu\Delta u penalty is used, the previous applied input is treated as fixed data, so the subproblem remains a convex QP in δu0:H1\delta u_{0:H-1}. Eliminating the linearized dynamics yields a dense box-constrained QP solved via OSQP (Stellato et al., 2020) with warm-starting between iterations.

Algorithm 1 SCP-Bilinear MPC
1:Initial state z0z_{0}, nominal controls u¯0:H1\bar{u}_{0:H-1}, initial trust-region radius ε0\varepsilon_{0}, model params
2:Optimized control sequence u0:H1u_{0:H-1}^{\star}
3:εε0\varepsilon\leftarrow\varepsilon_{0}
4:Roll out z¯0:H\bar{z}_{0:H} under eq. 8
5:for n=1,,NSCPn=1,\ldots,N_{\mathrm{SCP}} do
6:  Compute A~k\tilde{A}_{k}, B~k\tilde{B}_{k} via eqs. 12 to 13
7:  Eliminate δz\delta z; form dense QP in δu0:H1\delta u_{0:H-1}
8:  Solve QP via OSQP (warm-started)
9:  if J(u¯+δu)J(u¯)J(\bar{u}+\delta u)\leq J(\bar{u}) then
10:   u¯u¯+δu\bar{u}\leftarrow\bar{u}+\delta u; roll out z¯0:H\bar{z}_{0:H}
11:  else
12:   εε/2\varepsilon\leftarrow\varepsilon/2
13:  end if
14:end for
15:return u¯0:H1\bar{u}_{0:H-1}
Proposition 3 (SCP Monotone Descent and KKT Convergence).

Consider the bilinear MPC problem eq. 11 with a continuously differentiable finite-horizon objective JJ whose stage and terminal terms are convex quadratic in the variables of each SCP subproblem (including, when used, control-increment penalties with the previous input treated as fixed problem data), and with box constraints 𝒰={u:uminuumax}\mathcal{U}=\{u:u_{\min}\leq u\leq u_{\max}\}. Suppose the SCP iterates {(u¯(n),z¯(n))}\{(\bar{u}^{(n)},\bar{z}^{(n)})\} remain in a bounded set, the implemented dynamics map FLTF_{\mathrm{LT}} of eq. 8 has locally Lipschitz Jacobian on a neighborhood of that set, and each convex subproblem includes both the feasibility constraint u¯k+δuk𝒰\bar{u}_{k}+\delta u_{k}\in\mathcal{U} and the trust-region constraint δukε\|\delta u_{k}\|_{\infty}\leq\varepsilon. Then:

(i) Monotone descent. Let Δ(n)>0\Delta^{(n)}>0 denote the predicted QP cost decrease at iteration nn. Because FLTF_{\mathrm{LT}} has locally Lipschitz Jacobian, the first-order model error on a trust region of radius ε\varepsilon is O(ε2)O(\varepsilon^{2}) uniformly on bounded sets. Hence the discrepancy between the linearized and true objectives over a finite horizon satisfies

|JlinJtrue|=O(ε2).|J_{\mathrm{lin}}-J_{\mathrm{true}}|=O(\varepsilon^{2}). (15)

When ε\varepsilon is chosen so that this approximation error is less than Δ(n)\Delta^{(n)}, the accepted step is truly descent, so the sequence {J(u¯(n))}\{J(\bar{u}^{(n)})\} is non-increasing. The trust-region shrinkage εε/2\varepsilon\leftarrow\varepsilon/2 on failed steps guarantees this condition is eventually met because the right-hand side of eq. 15 vanishes as ε0\varepsilon\to 0.

(ii) KKT convergence. If the standard regularity assumptions for trust-region SCP hold at an accumulation point—in particular a constraint qualification for the active box constraints—then any accumulation point (u¯,z¯)(\bar{u}^{\star},\bar{z}^{\star}) of the accepted iterates is a KKT point of eq. 11. Equivalently, the predicted QP decrease vanishes only at first-order stationary points of the original nonlinear MPC problem, and the associated multipliers satisfy the KKT conditions by the standard trust-region SCP argument of Mao et al. (2018, Theorem 2).

Proof.

By construction, the matrices A~k,B~k\tilde{A}_{k},\tilde{B}_{k} in eqs. 12 to 13 are the exact Jacobians of the implemented Lie–Trotter map FLTF_{\mathrm{LT}}, computed without finite differences.

For part (i), local Lipschitz continuity of the Jacobian of FLTF_{\mathrm{LT}} yields a quadratic first-order remainder on each trust region. Over a finite horizon, boundedness of the iterates implies a uniform constant in the bound, so |JlinJtrue|=O(ε2)|J_{\mathrm{lin}}-J_{\mathrm{true}}|=O(\varepsilon^{2}) on the trust region. Hence sufficiently small trust regions guarantee that the true cost decrease matches the predicted decrease up to higher-order error, which yields monotone descent after finitely many shrinkage steps.

For part (ii), feasibility is preserved because each QP enforces u¯k+δuk𝒰\bar{u}_{k}+\delta u_{k}\in\mathcal{U}. Once the predicted decrease tends to zero, the accepted iterates satisfy the first-order necessary conditions of the nonlinear MPC problem under the standard trust-region SCP regularity assumptions. The cited result of Mao et al. (2018, Theorem 2) then implies that every accumulation point of the accepted iterates is a KKT point of eq. 11. ∎

Remark 2.

The boundedness assumption on iterates is retained as an explicit hypothesis. In practice it is promoted by compact control constraints together with the spectral regularization and runtime stability checks of Section 3.4, but those mechanisms do not by themselves constitute a formal proof that ρ(Aeff(u))<1\rho(A_{\mathrm{eff}}(u))<1 uniformly for all u𝒰u\in\mathcal{U}.

Corollary 3.1.

The conditional-independence baseline eq. 4 solves a single QP per MPC step but cannot represent control–state coupling. The proposed method solves NSCP{1,5}N_{\mathrm{SCP}}\in\{1,5\} QPs on the implemented bilinear model. Both are polynomial-time per iteration; the additional QPs incur negligible wall-clock cost relative to the modeling gain (Table 3).

5 Experiments

5.1 Setup

Systems.

We evaluate on two benchmarks across time-invariant and time-varying regimes, yielding four cells.

CartPole. Standard underactuated cart-pole swing-up; 44 states (x,x˙,θ,θ˙)(x,\dot{x},\theta,\dot{\theta}), 11 control (horizontal force). The horizontal force enters the equations of motion through products with trigonometric functions of θ\theta, making the system control-affine with state-dependent input gain g(x)g(x). The Koopman generator of such a system acts bilinearly on observables in the control input (Goswami and Paley, 2020), so any sufficiently expressive lifting that captures cosθ\cos\theta, sinθ\sin\theta, and θ˙2\dot{\theta}^{2} inherits a uzu\cdot z coupling that MamKO’s conditional-independence constraint excludes. Whether the learned encoder spans these coordinates well enough for the gap to be empirically detectable is one of the questions our forecasting experiments answer (Table 2). The time-invariant variant (TI) uses constant cart friction; the time-varying variant (TV) modulates the friction coefficient as μc(t)=μcbase+Aμsin(ωμt)\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+A_{\mu}\sin(\omega_{\mu}t).

RSCP. The reactor–separator process of Liu et al. (2008), adopted as the MamKO benchmark in Li et al. (2025). Two CSTRs in series followed by a flash separator with recycle; 99 states and 33 controls (heat duties Q1,Q2,Q3Q_{1},Q_{2},Q_{3}). The heat duties enter the energy balances additively as +Qi/(ρcpVi)+Q_{i}/(\rho c_{p}V_{i}): the governing ODE contains no uxu\cdot x terms, only nonlinearity in state. The TV variant modulates feed composition sinusoidally, introducing time-varying parameters without adding control–state coupling. RSCP exercises the second axis of Section 1 (time-varying parameters) without the first (control coupling), making the two benchmarks complementary tests of the bilinear extension.

Baselines.

Our primary structural ablation is the conditional-independence baseline, which is equivalent to our model with Gi=0G_{i}=0. We instantiate this baseline using the MamKO architecture (Li et al., 2025), which is representative of the class. Throughout, “Linear” refers to this Gi=0G_{i}=0 baseline. MamKO itself was evaluated against the Deep Koopman Operator (DKO) in the original work and established as the stronger time-varying Koopman baseline; we inherit this comparison by transitivity. Our method is denoted Bilinear-rr; we use r=dzr=d_{z} (full rank) in all reported runs, giving r=8r=8 on CartPole and r=15r=15 on RSCP (Table 6). Closed-loop variants are Bilinear-SCP-NN for N{1,5}N\in\{1,5\} SCP iterations.

Protocol.

We generate 5,0005{,}000 training, 1,0001{,}000 validation, and 1,0001{,}000 test trajectories per system using mixed sinusoidal and step-function control excitation. All Koopman-based methods use the same latent dimension dzd_{z}, prediction horizon, and training budget; dataset splits and random seeds are held constant. Forecasting is reported as 30-step open-loop MSE on held-out test trajectories. Closed-loop MPC is evaluated over 1010 episodes per cell under a setpoint-tracking task; we report cumulative running-average cost as a function of time, with shaded 0.3σ0.3\sigma bands111Bands are reported at 0.3σ0.3\sigma for visual readability of model separation; the conclusions are unchanged at 1σ1\sigma, though some panels then exhibit fully overlapping bands. We report the band width explicitly so readers can interpret the figures under the variance scale of their preference. for visual comparison.

Reproducibility against published values.

Our re-trained Linear baseline reproduces the forecast MSE of Li et al. (2025, Table 1) within 3%\sim 3\% on CartPole TI and within 0.78×0.78\times on RSCP TI. On CartPole TV our Linear baseline trains to MSE 7.8×1037.8\times 10^{-3} versus the published 4.3×1044.3\times 10^{-4}, a gap we have not closed under matched hyperparameters; on RSCP TV our Linear baseline trains to 7.1×1047.1\times 10^{-4} versus the published 9.7×1039.7\times 10^{-3}, a 13×13\times improvement that we attribute to the training-stability mechanism documented in Section 5.4. Bilinear-vs-Linear comparisons throughout this paper are reported against our own re-trained baseline; reviewers comparing absolute numbers should reference this calibration.

5.2 Time-Varying Capture: SCP Iteration and Lead-Time Robustness

The headline cell of this paper is RSCP TV: a chemical reactor benchmark with sinusoidally modulated feed composition, no uxu\cdot x coupling in its governing ODE, and time-varying parameters that the linear lift can only track through history-driven re-generation of the operator at each MPC step. Two experiments converge on a single finding: under TV physics, the bilinear coupling provides operator-level correction that the linear lift architecturally cannot reach within a single MPC horizon, but unlocking it requires either iterative re-linearization at solve time or commitment to a control sequence whose effect on the operator is structurally encoded.

SCP iteration unlocks TV capture under standard MPC.

Figure 1 reports cumulative running-average closed-loop cost for Linear (single QP), Bilinear-SCP-11 (one SCP iteration on the bilinear model—equivalent to LQR-style single linearization), and Bilinear-SCP-55 (five iterations) across the four cells. On RSCP TV (bottom right), Bilinear-SCP-55 separates from both other controllers at t0.4t\approx 0.4 h and the gap widens monotonically through end of horizon. Bilinear-SCP-11, by contrast, does not distinguish itself from Linear and trails it slightly. A single linearization of the bilinear model is insufficient to exploit the additional capacity under TV physics: the SCP iterations are what unlock the operator-level corrections the bilinear coupling makes available, by repeatedly re-linearizing Aeff(uk)=diag(A)+iuk(i)GiA_{\mathrm{eff}}(u_{k})=\mathrm{diag}(A)+\sum_{i}u_{k}^{(i)}G_{i} at the current operating point as the iterates converge.

Table 1 reports per-cell closed-loop tracking cost for Linear and our Bilinear-SCP-5 controller over the matched evaluation window of Li et al. (2025, Fig. 3). Bilinear-SCP-5 wins on three of four cells, with the largest gap on RSCP TV (3.443.44 vs 4.91×1034.91\times 10^{-3}, a 30%30\% reduction). The single loss is a 5%5\% gap on CartPole TI; this regime carries no uxu\cdot x coupling, no time-varying parameters, and benign dynamics, so the additional SCP overhead does not compound into a benefit.

Table 1: Closed-loop MPC cumulative cost over 10 episodes per cell, on the matched evaluation window of Li et al. (2025, Fig. 3). Bold: best per cell.
Cell Linear Bilinear-SCP-5
CartPole TI (20 s) 2.40×𝟏𝟎𝟐\mathbf{2.40{\times}10^{-2}} 2.54×1022.54{\times}10^{-2}
CartPole TV (20 s) 1.79×1021.79{\times}10^{-2} 1.73×𝟏𝟎𝟐\mathbf{1.73{\times}10^{-2}}
RSCP TI (2 h) 7.90×1047.90{\times}10^{-4} 6.43×𝟏𝟎𝟒\mathbf{6.43{\times}10^{-4}}
RSCP TV (2 h) 4.91×1034.91{\times}10^{-3} 3.44×𝟏𝟎𝟑\mathbf{3.44{\times}10^{-3}}

We emphasize that Bilinear-SCP-11 is not a stripped-down ablation but a realistic deployment baseline: it is what one obtains from running the bilinear model under a budget-constrained MPC stack (e.g., compute-limited edge controllers). The separation between SCP-11 and SCP-55 on RSCP TV is the empirical cost of single linearization in the TV regime.

Refer to caption
Figure 1: Closed-loop MPC cost over 1010 episodes per cell. Bilinear-SCP-55 separates from both other controllers on RSCP TV (bottom right) starting around t0.4t\approx 0.4 h, with the gap widening monotonically through end of horizon. Other cells discussed in Section 5.5.

Lead-time commitment unlocks the same correction with a frozen backbone.

In deployment, MPC controllers are not always free to re-plan at every control step: compute budgets, network round-trip delays, and asynchronous sensor updates force the controller to commit to a previously computed plan for some lead time before re-planning. We probe this regime via a lead-time experiment: at each MPC step the controller commits to its plan for dd subsequent steps before re-planning, with d{0,1,3,5}d\in\{0,1,3,5\}. The d=0d=0 case recovers standard receding-horizon MPC. Crucially, we hold both the original plan and the backbone dynamics fixed during the lead window (regen=never), so neither model receives intermediate corrections from new sensor data. This isolates the open-loop correction capacity of the model class itself: the linear lift’s drift A¯k\bar{A}_{k} is fixed across the entire lead window because it depends only on history; the bilinear model’s effective drift Aeff(uk)A_{\mathrm{eff}}(u_{k}) varies with the committed control uku_{k} along the lead window, providing operator-level correction even with the backbone frozen. We exclude d=10d=10 from analysis: at over three minutes of stale plan on RSCP, both models saturate from discretization-induced error rather than model-class differences.

Figure 2 shows the result on RSCP TV. The bilinear model sits below Linear at every commitment window (Table 7), with the gap widening sharply once planning becomes stale. At d=0d=0 the bilinear advantage is modest (2.54-2.54 vs 2.32-2.32, a gap of 0.220.22 in log-cost). At d=1d=1, Linear degrades sharply—its cumulative log-cost plateaus near 1.87-1.87 by end of horizon, while the bilinear model reaches 2.78-2.78, a gap of nearly 11 in log-cost. The pattern persists at d=3d=3 (Linear plateau 2.06-2.06, bilinear 2.57-2.57) and shrinks at d=5d=5 where both models saturate but bilinear still leads by 0.22\sim 0.22 in log-cost. Figure 3 reports the same experiment on CartPole TV. The bilinear advantage is small throughout and roughly constant in dd (Table 8); the underlying dynamics are benign enough that plan staleness does not catastrophically degrade either model.

Refer to caption
Figure 2: Lead-time robustness on RSCP TV under stale-plan execution (regen=never): both the plan and the backbone are frozen during the lead window. The bilinear model maintains an advantage at every dd, with the gap widening sharply at d{1,3}d\in\{1,3\} where Linear degrades while bilinear continues to converge; at d=5d=5 both models saturate but bilinear still leads. Bands are ±0.3σ\pm 0.3\sigma over 1010 episodes.
Refer to caption
Figure 3: Lead-time robustness on CartPole TV under stale-plan execution. Bilinear sits modestly below Linear at every dd, with the gap roughly constant across the sweep. The system is benign enough that staleness does not catastrophically degrade either model. Bands are ±0.3σ\pm 0.3\sigma over 1010 episodes.

One mechanism, two regimes.

Both findings identify the same mechanism. The bilinear coupling encodes how the drift varies with the control, and this variation is exploitable through any control regime that exercises the dependency: iterated SCP exercises it via re-linearization at the current operating point; lead-time commitment exercises it via rollout of the committed control sequence with the backbone frozen. The Linear baseline, lacking the coupling, has no analogous mechanism: its only access to time-varying physics is backbone re-generation, which fails under freeze and is insufficient under standard MPC. This regime is operationally relevant: control deployments commonly enforce re-planning at intervals longer than the underlying simulator step due to compute, communication, or scheduling constraints.

5.3 Forecasting Non-Inferiority

Table 2: Forecast MSE on held-out trajectories. Best-checkpoint (single minimum-validation-loss epoch, the convention of Li et al. (2025, Table 1)) and mean over the final 5050 training epochs (deployment-relevant, robust to selection from noisy training curves). Bold: best per cell per metric.
Model Metric CP TI CP TV RSCP TI RSCP TV
Linear best 6.31×1046.31{\times}10^{-4} 7.80×1037.80{\times}10^{-3} 2.28×𝟏𝟎𝟑\mathbf{2.28{\times}10^{-3}} 7.14×𝟏𝟎𝟒\mathbf{7.14{\times}10^{-4}}
mean50 6.67×1046.67{\times}10^{-4} 8.01×1038.01{\times}10^{-3} 2.36×1032.36{\times}10^{-3} 9.09×1049.09{\times}10^{-4}
Bilinear best 4.45×𝟏𝟎𝟒\mathbf{4.45{\times}10^{-4}} 7.43×𝟏𝟎𝟑\mathbf{7.43{\times}10^{-3}} 2.28×1032.28{\times}10^{-3} 8.80×1048.80{\times}10^{-4}
mean50 4.77×𝟏𝟎𝟒\mathbf{4.77{\times}10^{-4}} 7.77×𝟏𝟎𝟑\mathbf{7.77{\times}10^{-3}} 2.36×𝟏𝟎𝟑\mathbf{2.36{\times}10^{-3}} 9.05×𝟏𝟎𝟒\mathbf{9.05{\times}10^{-4}}

Table 2 reports forecast MSE under best-checkpoint (the single epoch with lowest validation loss, matching the convention of Li et al. (2025, Table 1)) and mean over the final 5050 training epochs (the value a deployed system would see, robust to selection from noisy training curves). The two metrics agree on three of four cells: bilinear wins CartPole TI by 29%\sim 29\%, edges CartPole TV by 335%5\%, and ties RSCP TI within noise—all consistent with Proposition 2.

The cells diverge on RSCP TV. Best-checkpoint favors Linear by 23%23\%; mean50 ties (bilinear marginally lower at 9.059.05 vs 9.09×1049.09\times 10^{-4}). Figure 4 (bottom right) explains the gap: the Linear baseline trains into a noisy basin that oscillates with amplitude ±0.5\pm 0.5 in log-loss for the entire 400400-epoch run, while bilinear converges smoothly within 50\sim 50 epochs and remains flat. Best-checkpoint selects one of Linear’s downward spikes; the tail mean averages across them. A model selected by best-checkpoint cannot be reliably re-deployed without the validation-loss oracle that selected it; the tail mean is the deployment-relevant comparison.

5.4 Training Stability

Validation-loss curves (Figure 4) reveal a striking asymmetry between cells. On CartPole TI/TV and RSCP TI, both models train smoothly to convergence. On RSCP TV, the Linear model exhibits sustained oscillations of ±0.5\pm 0.5 in log-loss throughout the entire 400400-epoch run, while the bilinear model converges smoothly within the first 50\sim 50 epochs and remains flat thereafter. The variance of the Linear model’s val loss across the final 200200 epochs exceeds the bilinear model’s by an order of magnitude.

We interpret this as an optimization-side benefit of the bilinear parameterization: under time-varying physics, the linear lift must continuously adjust its operator generation to track the moving target, producing the noisy training signal we observe. The bilinear coupling absorbs a portion of the time-varying capacity directly into a stationary operator structure, decoupling the optimization landscape from the data’s non-stationarity. The headline best-checkpoint number for RSCP TV (bilinear +23%+23\% worse than Linear) and the tied mean50 comparison (Table 2) together tell a single story: Linear’s apparent advantage is the selection artifact, while bilinear achieves comparable forecasting performance with dramatically more reliable training—a property that matters for deployment even when raw MSE does not separate the models.

The structural origin of this stabilization—the bilinear coupling absorbing TV capacity into a stationary operator—is the same mechanism that produces the closed-loop gains documented in Section 5.2. The training-time and inference-time effects of the coupling are two faces of one phenomenon.

Refer to caption
Figure 4: Validation loss over training. RSCP TV (bottom right) exhibits sustained oscillation of the Linear model’s val loss; the bilinear model converges smoothly. The variance gap across the final 200200 epochs is an order of magnitude.

5.5 Other Cells: CartPole and RSCP TI

The remaining cells (CartPole TI/TV, RSCP TI) exercise the bilinear extension under conditions in which it is either underutilized (CartPole, where forecasting wins do not translate to closed-loop separation) or operating outside its primary regime (RSCP TI, no time-varying parameters, no uxu\cdot x coupling). They are reported here for completeness; none is the headline cell.

CartPole TI/TV. All three closed-loop controllers (Linear, Bilinear-SCP-11, Bilinear-SCP-55) track within overlapping bands on both variants (Figure 1, top row); the bilinear extension does not hurt closed-loop performance even when its strict forecasting advantage (Table 2) does not translate into MPC gains. Lead-time results (Figure 3) show bilinear sitting modestly below Linear at every dd, with a roughly constant gap across the sweep. The system is benign enough that staleness does not catastrophically degrade either model.

RSCP TI. Both bilinear variants sit below Linear from t0.5t\approx 0.5 h through the end of horizon, with 0.3σ0.3\sigma bands clearly separated by t1.5t\approx 1.5 h (Figure 1, lower left). Bilinear-SCP-11 marginally edges Bilinear-SCP-55 here—unlike the RSCP TV pattern, single linearization is sufficient on time-invariant dynamics. We attribute the bilinear gain to compounding accuracy in the multi-step rollout under the convex, uxu\cdot x-free dynamics: even without TV physics, the additional lifting capacity offered by GG is occasionally useful when the controller exercises the input range broadly during transients.

5.6 Coupling Strength Across Cells

The learned GF\|G\|_{F} values vary substantially across cells in ways that illuminate the structural mechanisms above: high under TV physics where GG absorbs operating-point drift, low under TI physics where Proposition 2 predicts G0G^{\star}\approx 0, and elevated on RSCP TI where the optimizer absorbs training slack into GG without physical target. Per-cell values and qualitative training-trajectory shapes are reported in Appendix E.

5.7 Computational Cost

For the TV stale-plan experiments, Table 3 reports mean wall-clock per control step under lead-time execution for the Linear controller and Bilinear-SCP-55 across commitment windows d{0,1,3,5}d\in\{0,1,3,5\}. Bilinear-SCP-55 is substantially slower than the single-QP Linear baseline but remains practical relative to the 1818 s RSCP sampling period; on CartPole, the timings should be read as an offline stress-test rather than a real-time deployment target. The decrease on CartPole TV with larger commitment window appears to be problem-dependent: the fixed prefix provides a better SCP warm start and often triggers earlier termination, whereas on RSCP TV the solver typically still reaches the maximum SCP iteration count.

Table 3: Mean wall-clock per control step under lead-time execution on the time-varying benchmarks
Method d=0d=0 d=1d=1 d=3d=3 d=5d=5
CP — Linear 0.0770.077 s 0.0770.077 s 0.0780.078 s 0.0780.078 s
CP — Bilinear 0.8440.844 s 0.8180.818 s 0.7740.774 s 0.7700.770 s
RSCP — Linear 0.0990.099 s 0.0950.095 s 0.0980.098 s 0.0980.098 s
RSCP — Bilinear 0.8760.876 s 0.8750.875 s 0.8890.889 s 0.9050.905 s

6 Related Work

Koopman operators for control.

Korda and Mezić (2018) first demonstrated that EDMD-based Koopman models enable efficient convex MPC. Subsequent work has extended this to probabilistic models (Han et al., 2021), graph neural network liftings (Li et al., 2020), and time-varying-system modeling (Hao et al., 2024). MamKO (Li et al., 2025) introduced time-varying operators via Mamba-inspired matrix generation. Our work is orthogonal: we introduce a structural extension (bilinear coupling) that any of these could in principle adopt.

Bilinear dynamical systems.

The theory of bilinear systems is classical (Mohler, 1973) and their controllability properties well characterized (Elliott, 2009). The contribution here is bringing this structure into the learned Koopman-for-control pipeline, with a low-rank parameterization that regularizes sample complexity and a tractable SCP controller that exploits the resulting Jacobian structure.

Sequential convex programming.

SCP methods have a long history in aerospace trajectory optimization (Mao et al., 2018; Malyuta et al., 2022). Our application is standard; the novelty is that the bilinear Koopman model provides exact model Jacobians eqs. 12 to 13, avoiding the finite-difference approximations that degrade SCP convergence in black-box settings.

Data-driven nonlinear control.

SINDy-C (Brunton et al., 2016; Kaiser et al., 2018) can learn bilinear terms but requires a fixed polynomial library and does not scale to high-dimensional systems. Neural ODE methods (Chen et al., 2018) are highly expressive but relinquish the convex MPC structure. Our method occupies a deliberate middle ground: more expressive than linear Koopman, more tractable than black-box dynamics.

7 Conclusion

We have shown that the conditional-independence constraint shared across the Mamba-Koopman family, while necessary to preserve MPC convexity in single-shot formulations, leaves two regimes structurally underrepresented: control-state coupling and time-varying parameters. Our bilinear extension addresses both with a minimal architectural change, fewer than 1%1\% added parameters, exact model Jacobians, and a provably convergent SCP controller.

Across CartPole (which exercises uxu\cdot x coupling) and RSCP (which exercises time-varying parameters in the TV variant without uxu\cdot x coupling), four consistent effects emerged: forecasting non-inferiority on all cells under deployment-relevant averaging, with strict gains where Proposition 2 predicts them; substantial training stabilization on RSCP TV; closed-loop MPC gains on RSCP TV when SCP is iterated to convergence; and graceful degradation under stale-plan execution, where the bilinear model maintains a clear robustness advantage. The learned GF\|G\|_{F} patterns provide an additional interpretive diagnostic for which mechanism each cell primarily exercises.

Open directions include extending the bilinear parameterization to the step-varying BkB_{k} matrices, imposing structured priors on GiG_{i} that encode known physical coupling topology, and designing benchmarks that isolate the uxu\cdot x structural axis from time-varying parameters more cleanly.

AI Assistance Disclosure

We used AI assistants (large language models) for language editing, phrasing, and LaTeX formatting throughout this manuscript. All technical content, mathematical results, experimental design, code, and analyses are the authors’ own work. The AI tools did not contribute novel ideas, proofs, or empirical results.

References

  • S. L. Brunton, J. L. Proctor, and J. N. Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. Cited by: §6.
  • R. T.Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud (2018) Neural ordinary differential equations. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 31. Cited by: §6.
  • S. Chen (2022) A machine learning-based approach to cybersecurity and safety of model predictive control systems. Ph.D. Dissertation, University of California, Los Angeles. Note: Section 2.4 documents the corrected RSCP formulation used in this work. Cited by: §B.2, §B.2, §B.2.
  • D. L. Elliott (2009) Bilinear control systems: matrices in action. Springer. Cited by: §6.
  • D. Goswami and D. A. Paley (2020) Global bilinearization and reachability analysis of control-affine nonlinear systems. In The Koopman Operator in Systems and Control: Concepts, Methodologies, and Applications, A. Mauroy, I. Mezić, and Y. Susuki (Eds.), pp. 81–98. External Links: ISBN 978-3-030-35713-9, Document Cited by: §5.1.
  • A. Gu and T. Dao (2023) Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Cited by: §2.2.
  • M. Han, J. Euler-Rolle, and R. K. Katzschmann (2021) DeSKO: stability-assured robust control with a deep stochastic Koopman operator. In International Conference on Learning Representations (ICLR), Cited by: §6.
  • W. Hao, B. Huang, W. Pan, D. Wu, and S. Mou (2024) Deep Koopman learning of nonlinear time-varying systems. Automatica 159, pp. 111372. Cited by: §6.
  • E. Kaiser, J. N. Kutz, and S. L. Brunton (2018) Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A 474, pp. 20180335. Cited by: §6.
  • D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §C.4.
  • B. O. Koopman (1931) Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences 17 (5), pp. 315–318. Cited by: §1, §2.1.
  • M. Korda and I. Mezić (2018) Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 93, pp. 149–160. Cited by: §1, §2.1, §6.
  • W. Li and E. Todorov (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO, pp. 222–229. Cited by: §2.3.
  • Y. Li, H. He, J. Wu, D. Katabi, and A. Torralba (2020) Learning compositional Koopman operators for model-based control. In International Conference on Learning Representations (ICLR), Cited by: §6.
  • Z. Li, M. Han, and X. Yin (2025) MamKO: Mamba-based Koopman operator for modeling and predictive control. In International Conference on Learning Representations (ICLR), Cited by: §B.1, §B.1, §B.1, §B.2, §B.2, §B.2, §C.2, §D.1, §D.2, §1, §1, §2.2, §5.1, §5.1, §5.1, §5.2, §5.3, Table 1, Table 1, Table 2, Table 2, §6.
  • J. Liu, D. Muñoz de la Peña, and P. D. Christofides (2009) Distributed model predictive control of nonlinear process systems. AIChE Journal 55 (5), pp. 1171–1184. Cited by: §B.2.
  • J. Liu, D. Muñoz de la Peña, B. J. Ohran, P. D. Christofides, and J. F. Davis (2008) A two-tier architecture for networked process control. Chemical Engineering Science 63 (22), pp. 5394–5409. Cited by: §B.2, §B.2, §5.1.
  • B. Lusch, J. N. Kutz, and S. L. Brunton (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 9, pp. 4950. Cited by: §1, §2.1.
  • D. Malyuta, T. P. Reynolds, M. Szmuk, T. Lew, R. Bonalli, M. Pavone, and B. Açikmeşe (2022) Convex optimization for trajectory generation: a tutorial on generating dynamically feasible trajectories reliably and efficiently. IEEE Control Systems Magazine 42 (5), pp. 40–113. Cited by: §6.
  • Y. Mao, M. Szmuk, X. Xu, and B. Açikmeşe (2018) Successive convexification: a superlinearly convergent algorithm for non-convex optimal control problems. arXiv preprint arXiv:1804.06539. Cited by: §2.3, §4.3, §6, Proposition 3.
  • R. I. McLachlan and G. R. W. Quispel (2002) Splitting methods. Acta Numerica 11, pp. 341–434. External Links: Document Cited by: §3.3.
  • I. Mezić (2005) Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics 41, pp. 309–325. Cited by: §1.
  • R. R. Mohler (1973) Bilinear control processes. Academic Press. Cited by: §1, §2.3, §6.
  • B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd (2020) OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation 12, pp. 637–672. Cited by: §D.1, §D.4, §4.3.
  • H. F. Trotter (1959) On the product of semi-groups of operators. Proceedings of the American Mathematical Society 10 (4), pp. 545–551. External Links: Document Cited by: §3.3.
  • M. O. Williams, I. G. Kevrekidis, and C. W. Rowley (2015) A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 25, pp. 1307–1346. Cited by: §1, §2.1.

Appendix A Stability: Training Penalty and Inference Screening

Training.

The spectral penalty in the loss is

stab=jReLU(|λj(Adisc,k)|1+ms),\mathcal{L}_{\mathrm{stab}}=\sum_{j}\mathrm{ReLU}\bigl(|\lambda_{j}(A_{\mathrm{disc},k})|-1+m_{s}\bigr),

with margin ms=0.05m_{s}=0.05. Eigenvalues are computed via torch.linalg.eigvals cast to float32 to avoid mixed-precision artifacts. Zero initialization of GiG_{i} ensures the model starts as an exact copy of the Linear baseline and the penalty is inactive at the start of training.

Inference.

We do not perform a per-step spectral check at inference. Closed-loop stability is enforced indirectly through the SCP trust-region constraint and the QP control bounds, both of which keep the linearized dynamics within the regime where our trained operators are well-conditioned. We did, however, evaluate the cheap Gershgorin disk pre-check (|Adisc,k,ii|+ji|Adisc,k,ij|<1|A_{\mathrm{disc},k,ii}|+\sum_{j\neq i}|A_{\mathrm{disc},k,ij}|<1) as an O(dz2)O(d_{z}^{2}) stability certificate at each MPC step. Empirically the disks straddle the unit circle in 100%100\% of MPC steps across both bilinear TV cells (CartPole and RSCP, d{0,1,3,5}d\in\{0,1,3,5\}, 1010 episodes ×1000\times 1000 steps each), because the off-diagonal mass introduced by the bilinear factor EP(uk)=exp(iuk(i)Gi)E_{P}(u_{k})=\exp(\sum_{i}u_{k}^{(i)}G_{i}) inflates the radii past unity even when ρ(Adisc,k)<1\rho(A_{\mathrm{disc},k})<1. A full eigendecomposition would therefore be required for any runtime spectral gate; for dz64d_{z}\leq 64 this is sub-millisecond per step and does not bottleneck the MPC loop, but we leave runtime stability gating to future work.

Appendix B System Dynamics

This appendix specifies the governing ODEs, parameters, and action spaces of the four system variants (CartPole TI/TV, RSCP TI/TV) used in the paper.

B.1 CartPole (TI, TV)

The state is (x,x˙,θ,θ˙)(x,\dot{x},\theta,\dot{\theta}) with θ\theta measured from the upright. The control is the horizontal force FF applied to the cart. Following the MamKO benchmark protocol [Li et al., 2025], we use gravity g=10g=10 m s-2, cart mass mc=1m_{c}=1 kg, pole mass mp=0.1m_{p}=0.1 kg, half-length =0.5\ell=0.5 m, force bound |F|20|F|\leq 20 N, and sampling period Δt=0.02\Delta t=0.02 s. The equations of motion include cart and pole friction terms μc,μp\mu_{c},\mu_{p} as in the MamKO reference implementation.

Time-invariant variant (TI).

The friction coefficients are fixed at μc=0\mu_{c}=0, μp=0\mu_{p}=0 (frictionless limit). This recovers the standard frictionless CartPole.

Time-varying variant (TV).

The cart friction is modulated as

μc(t)=μcbase+sin(ωt),\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\sin(\omega\,t),

with μcbase=5×104\mu_{c}^{\mathrm{base}}=5\times 10^{-4}, μp=2×106\mu_{p}=2\times 10^{-6}, and ω=1\omega=1 rad s-1, matching the MamKO TV configuration. This is the specific cell used in the closed-loop MPC and lead-time experiments; results for ω{0.1,10}\omega\in\{0.1,10\} are not reported here.

Note on the sin/cos discrepancy.

The MamKO paper writes μc(t)=μcbase+cos(ωt)\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\cos(\omega t) (Eq. 16 of Li et al. [2025]); the official MamKO repository uses sin in cartpole_V.py. We follow the repository, which is the executable ground truth.

Parameters.

Table 4 lists the numerical constants used for the CartPole benchmark. The dynamical constants and time-varying friction settings follow the MamKO benchmark protocol [Li et al., 2025]; the shared MamKO architecture hyperparameters for CartPole are summarized separately in Table 6.

Table 4: CartPole parameters for the time-invariant and time-varying benchmarks.
Symbol Value Description
gg 1010 m s-2 gravitational acceleration
mcm_{c} 11 kg cart mass
mpm_{p} 0.10.1 kg pole mass
\ell 0.50.5 m pole half-length
Δt\Delta t 0.020.02 s sampling period
|F||F| 20\leq 20 N control bound on horizontal force
μcbase\mu_{c}^{\mathrm{base}} 5×1045{\times}10^{-4} base cart friction coefficient setting
μp\mu_{p} 2×1062{\times}10^{-6} pole friction coefficient
ω\omega (TV) 11 rad s-1 frequency in μc(t)=μcbase+sin(ωt)\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\sin(\omega t)

B.2 RSCP: Reactor–Separator Process

RSCP is a reactor–separator process consisting of two continuous stirred-tank reactors (CSTR-1 and CSTR-2) operating in series, feeding a flash separator whose liquid bottoms are partially recycled to CSTR-1 [Liu et al., 2008]. The system was adopted as the MamKO benchmark in Li et al. [2025]. Two parallel first-order reactions ABCA\to B\to C proceed in each CSTR with Arrhenius kinetics; the separator performs ideal vapor–liquid equilibrium at fixed pressure with no reaction. The state vector is

x=(xA,1,xB,1,T1,xA,2,xB,2,T2,xA,3,xB,3,T3),x=(x_{A,1},\,x_{B,1},\,T_{1},\,x_{A,2},\,x_{B,2},\,T_{2},\,x_{A,3},\,x_{B,3},\,T_{3}),

where xA,i,xB,ix_{A,i},x_{B,i} are the mass fractions of species A,BA,B in vessel ii and TiT_{i} is the corresponding temperature in Kelvin; vessel indices 1,21,2 refer to the two CSTRs and 33 to the flash separator. The control is u=(Q1,Q2,Q3)u=(Q_{1},Q_{2},Q_{3}), the heat duties applied to each vessel in kJ h-1.

Governing ODEs.

The mass and energy balances follow Chen [2022, Section 2.4] rather than the formulation in the original Liu papers; see “Reproducibility notes” below for the discrepancies that motivated this choice. Letting F1=F10+FrF_{1}=F_{10}+F_{r} and F2=F1+F20F_{2}=F_{1}+F_{20} denote the total throughput of each CSTR, r1,i=k1exp(E1/(RTi))r_{1,i}=k_{1}\exp(-E_{1}/(RT_{i})) and r2,i=k2exp(E2/(RTi))r_{2,i}=k_{2}\exp(-E_{2}/(RT_{i})) the Arrhenius rates, and βj=(ΔHj)CM/(ρcp)\beta_{j}=(-\Delta H_{j})C_{M}/(\rho c_{p}) the heat-of-reaction coefficients (see paragraph below), the governing equations for CSTR-1 and CSTR-2 are

x˙A,1\displaystyle\dot{x}_{A,1} =F10V1(xA,10xA,1)+FrV1(xA,rxA,1)\displaystyle=\tfrac{F_{10}}{V_{1}}(x_{A,10}-x_{A,1})+\tfrac{F_{r}}{V_{1}}(x_{A,r}-x_{A,1})
r1,1xA,1,\displaystyle\quad-r_{1,1}x_{A,1},
x˙B,1\displaystyle\dot{x}_{B,1} =F10V1(xB,10xB,1)+FrV1(xB,rxB,1)\displaystyle=\tfrac{F_{10}}{V_{1}}(x_{B,10}-x_{B,1})+\tfrac{F_{r}}{V_{1}}(x_{B,r}-x_{B,1})
+r1,1xA,1r2,1xB,1,\displaystyle\quad+r_{1,1}x_{A,1}-r_{2,1}x_{B,1},
T˙1\displaystyle\dot{T}_{1} =F10V1(T10T1)+FrV1(T3T1)\displaystyle=\tfrac{F_{10}}{V_{1}}(T_{10}-T_{1})+\tfrac{F_{r}}{V_{1}}(T_{3}-T_{1})
+β1r1,1xA,1+β2r2,1xB,1\displaystyle\quad+\beta_{1}r_{1,1}x_{A,1}+\beta_{2}r_{2,1}x_{B,1}
+Q1ρcpV1,\displaystyle\quad+\tfrac{Q_{1}}{\rho c_{p}V_{1}},

and analogously for CSTR-2 with (F1,F20,V2,T20,Q2)(F_{1},F_{20},V_{2},T_{20},Q_{2}) replacing (F10,Fr,V1,T10,Q1)(F_{10},F_{r},V_{1},T_{10},Q_{1}) and the previous-vessel state replacing the recycle stream. The flash separator carries no reaction:

x˙A,3\displaystyle\dot{x}_{A,3} =F2V3(xA,2xA,3)Fr+FpV3(xA,rxA,3),\displaystyle=\tfrac{F_{2}}{V_{3}}(x_{A,2}-x_{A,3})-\tfrac{F_{r}+F_{p}}{V_{3}}(x_{A,r}-x_{A,3}),
x˙B,3\displaystyle\dot{x}_{B,3} =F2V3(xB,2xB,3)Fr+FpV3(xB,rxB,3),\displaystyle=\tfrac{F_{2}}{V_{3}}(x_{B,2}-x_{B,3})-\tfrac{F_{r}+F_{p}}{V_{3}}(x_{B,r}-x_{B,3}),
T˙3\displaystyle\dot{T}_{3} =F2V3(T2T3)+Q3ρcpV3\displaystyle=\tfrac{F_{2}}{V_{3}}(T_{2}-T_{3})+\tfrac{Q_{3}}{\rho c_{p}V_{3}}
+(Fr+Fp)CMρcpV3(xA,rΔHvap,A\displaystyle\quad+\tfrac{(F_{r}+F_{p})C_{M}}{\rho c_{p}V_{3}}\bigl(x_{A,r}\Delta H_{\mathrm{vap},A}
+xB,rΔHvap,B+xC,rΔHvap,C).\displaystyle\quad\quad+x_{B,r}\Delta H_{\mathrm{vap},B}+x_{C,r}\Delta H_{\mathrm{vap},C}\bigr).

The recycle compositions xA,r,xB,r,xC,rx_{A,r},x_{B,r},x_{C,r} are determined by ideal vapor-liquid equilibrium with relative volatilities (αA,αB,αC)(\alpha_{A},\alpha_{B},\alpha_{C}),

xA,r\displaystyle x_{A,r} =αAxA,3D,\displaystyle=\tfrac{\alpha_{A}x_{A,3}}{D}, (16)
xB,r\displaystyle x_{B,r} =αBxB,3D,\displaystyle=\tfrac{\alpha_{B}x_{B,3}}{D},
xC,r\displaystyle x_{C,r} =αC(1xA,3xB,3)D.\displaystyle=\tfrac{\alpha_{C}(1-x_{A,3}-x_{B,3})}{D}.
D=αAxA,3+αBxB,3+αC(1xA,3xB,3).D=\alpha_{A}x_{A,3}+\alpha_{B}x_{B,3}+\alpha_{C}(1-x_{A,3}-x_{B,3}).

The structurally important property for this paper is that the heat duties QiQ_{i} enter the energy balances additively as +Qi/(ρcpVi)+Q_{i}/(\rho c_{p}V_{i}) with constant coefficients, contributing no uxu\cdot x terms.

Parameters.

Table 5 lists the numerical values used. Volumes, flows, feed states, kinetic constants, activation energies, heats of reaction, and relative volatilities are taken from Chen [2022, Table 2.1]; the molar concentration factor CM=2C_{M}=2 kmol m-3 is the constant that converts mass fractions to molar concentrations in the energy balance (see “Reproducibility notes”).

Table 5: RSCP parameters (after the corrections in Section B.2).
Symbol Value Description
V1,V2,V3V_{1},V_{2},V_{3} 1.0, 0.5, 1.01.0,\ 0.5,\ 1.0 m3 vessel volumes
F10,F20F_{10},F_{20} 5.04, 5.045.04,\ 5.04 m3 h-1 feed flows
Fr,FpF_{r},F_{p} 50.4, 5.0450.4,\ 5.04 m3 h-1 recycle, purge
T10,T20T_{10},T_{20} 300, 300300,\ 300 K feed temperatures
xA,10,xA,20x_{A,10},x_{A,20} 1.0, 1.01.0,\ 1.0 feed compositions (AA)
k1,k2k_{1},k_{2} 9.972×106, 9.36×1069.972{\times}10^{6},\ 9.36{\times}10^{6} h-1 kinetic constants
E1,E2E_{1},E_{2} 5×104, 6×1045{\times}10^{4},\ 6{\times}10^{4} kJ kmol-1 activation energies
ΔH1,ΔH2\Delta H_{1},\Delta H_{2} 1.2×105,1.4×105-1.2{\times}10^{5},\ -1.4{\times}10^{5} kJ kmol-1 reaction enthalpies
ρ\rho 10001000 kg m-3 density
cpc_{p} 4.24.2 kJ kg-1 K-1 specific heat
CMC_{M} 22 kmol m-3 molar concentration factor
ΔHvap,A,B,C\Delta H_{\mathrm{vap},A,B,C} 3.53,1.57,4.07-3.53,\ -1.57,\ -4.07×104\times 10^{4} kJ kmol-1 vap. enthalpies
αA,αB,αC\alpha_{A},\alpha_{B},\alpha_{C} 3.5, 1.0, 0.53.5,\ 1.0,\ 0.5 relative volatilities
RR 8.3148.314 kJ kmol-1 K-1 gas constant

Reproducibility notes.

We verified that the RSCP ODE system as written in Liu et al. [2008, 2009] does not reproduce the steady state reported by those papers under their stated parameter values. The discrepancy resolves under three corrections drawn from Chen [2022, Section 2.4]: (i) the heat-of-reaction term carries a molar concentration factor βj=(ΔHj)CM/(ρcp)\beta_{j}=(-\Delta H_{j})C_{M}/(\rho c_{p}) rather than ΔHj/cp\Delta H_{j}/c_{p}, where CM=2C_{M}=2 kmol m-3; (ii) the separator energy balance includes the convective transport of vaporization enthalpies ΔHvap,{A,B,C}\Delta H_{\mathrm{vap},\{A,B,C\}} for the three species; (iii) the steady-state heat duties QiQ_{i} are of order 10610^{6} kJ h-1 rather than the 10910^{9} value listed in Chen [2022, Table 2.1], which we identified as a typographical exponent error. Under the corrected formulation, forward integration from the published steady state has x˙<106\|\dot{x}\|_{\infty}<10^{-6} on temperatures and within 4×1034\times 10^{-3} on compositions. We use this corrected system throughout. The published MamKO implementation of RSCP is not publicly available, so we cannot directly confirm that our reconstructed system matches theirs; however, the MamKO paper’s reported steady state and our forward-integration check are mutually consistent under the corrected ODEs.

Time-varying variant.

The TV variant follows the MamKO synthetic time-varying modifier φc(t)=e0.01t\varphi_{c}(t)=e^{-0.01t} [Li et al., 2025, Appendix C.3], applied multiplicatively to both Arrhenius rate terms in all three vessels. This simulates catalyst deactivation: reaction rates decay exponentially with time, and the model must track operating-point drift induced by the slowly shrinking effective kinetics. Crucially, φc(t)\varphi_{c}(t) multiplies only the kinetic terms, not the convective or heat-duty terms, so the structural property that uku_{k} does not couple multiplicatively with xkx_{k} is preserved in the TV variant: RSCP TV exercises the time-varying-parameter axis of Section 1 without introducing uxu\cdot x structure.

Nominal operating point and controls.

The MPC tracking task drives the system to the nominal steady state

xs=(0.18, 0.67, 480.32, 0.20, 0.65,472.79, 0.07, 0.67, 474.89)x_{s}=(0.18,\;0.67,\;480.32,\;0.20,\;0.65,\\ 472.79,\;0.07,\;0.67,\;474.89)

following Li et al. [2025, Appendix C.3]. The control bounds are |QiQi,s|106|Q_{i}-Q_{i,s}|\leq 10^{6} kJ h-1 around the nominal heat duties Qi,sQ_{i,s}; these are enforced by the simulator’s action-space clipping rather than by the QP, since the QP operates in normalized control space (see Section D.4). The sampling period is Δt=18\Delta t=18 s (=0.005=0.005 h).

Appendix C Data Generation and Training Protocol

C.1 Trajectory Generation

For each system variant we generate training and test data with the official MamKO ReplayMemory pipeline. Continuous rollouts under per-step i.i.d. uniform random control excitation are segmented into overlapping input–output windows of length Hlookback+Hpred=60H_{\mathrm{lookback}}+H_{\mathrm{pred}}=60. Episodes are integrated by forward Euler at the system sampling period, with no sub-stepping: Δt=0.02\Delta t=0.02 s for CartPole and Δt=18\Delta t=18 s for RSCP. Episodes terminate on physical bound violations or at a fixed horizon, whichever comes first: for CartPole, when |θ|>20|\theta|>20^{\circ} or |x|>10|x|>10; for RSCP, when concentration or temperature bounds are violated; for both systems, at 20,04020{,}040 steps for training episodes and 1,0001{,}000 steps for test episodes. Windows are pooled across episodes until the configured target is reached: 39,90039{,}900 training windows, split 80/2080/20 into train/validation subsets via a random split with seed 11, and 4,0004{,}000 test windows.

Control excitation is drawn independently at each step. For CartPole, the force is sampled as F𝒰(20,20)F\sim\mathcal{U}(-20,20) N. For RSCP, each heat duty is sampled as Qi𝒰(Qis106,Qis+106)Q_{i}\sim\mathcal{U}(Q_{i}^{s}-10^{6},\ Q_{i}^{s}+10^{6}) kJ h-1, where QsQ^{s} denotes the steady-state heating rates. Initial states are sampled at the start of each episode: for CartPole, x0𝒰(4,4)x_{0}\sim\mathcal{U}(-4,4) m and θ0𝒰(0.1,0.1)\theta_{0}\sim\mathcal{U}(-0.1,0.1) rad, with zero initial velocities; for RSCP, components are drawn independently and uniformly within per-component half-widths around a verified fixed point x¯\bar{x} of the ODEs, x0𝒰(x¯𝝆,x¯+𝝆)x_{0}\sim\mathcal{U}(\bar{x}-\boldsymbol{\rho},\ \bar{x}+\boldsymbol{\rho}) with 𝝆=(0.05, 0.05, 10K, 0.05, 0.05, 10K, 0.02, 0.05, 10K)\boldsymbol{\rho}=(0.05,\ 0.05,\ 10\,\mathrm{K},\ 0.05,\ 0.05,\ 10\,\mathrm{K},\ 0.02,\ 0.05,\ 10\,\mathrm{K}) in the state ordering (xA,i,xB,i,Ti)(x_{A,i},x_{B,i},T_{i}) for i=1,2,3i=1,2,3. Here x¯\bar{x} denotes the numerically verified fixed point of the corrected RSCP ODEs rather than the approximate steady state reported in the MamKO appendix; we use x¯\bar{x} as the perturbation center to avoid systematic drift back toward the true equilibrium during data generation. The data-generation seed for the train/validation split is 11 and is reported alongside the released code.

C.2 Model Architecture

The encoder, backbone, and decoder follow the MamKO configuration [Li et al., 2025] for each system class. We re-use the MamKO reference implementation hyperparameters; only the bilinear extension introduces new parameters (GiG_{i} tensors at full rank r=dzr=d_{z}, plus the spectral penalty weight λs\lambda_{s}). Table 6 summarizes the per-system settings.

Table 6: Architecture hyperparameters per system. CartPole settings follow the MamKO reference implementation; RSCP settings follow its RSCP benchmark configuration. Bilinear-specific entries (rr, λs\lambda_{s}, discretization scheme) are reported under Bilinear-only.
CartPole RSCP
Lookback window 3030 3030
Forecast horizon 3030 3030
Latent dimension dzd_{z} 88 1515
Mamba conv kernel dconvd_{\mathrm{conv}} 1515 55
Hidden dimension 6464 6464
Bilinear-only
Bilinear rank rr 88 1515
Spectral penalty λs\lambda_{s} 0.010.01 0.010.01
Discretization lie_trotter lie_trotter

The GiG_{i} tensors are factorized as Gi=LiRiG_{i}=L_{i}R_{i}^{\top} with Li,Ridz×rL_{i},R_{i}\in\mathbb{R}^{d_{z}\times r}, both initialized to zero. Zero initialization makes the bilinear model an exact copy of the Linear baseline at training step zero, so any training-time divergence between the two is attributable solely to the bilinear terms.

C.3 Lie–Trotter Discretization Details

The baseline MamKO diagonal step is preserved exactly by evaluating the diagonal flow element-wise:

[ED]nn=exp(anδn).[E_{D}]_{nn}=\exp(a_{n}\delta_{n}).

The associated diagonal input integral is computed element-wise as

[B¯diag,k]n,j=exp(anδn)1an[Bk(c)]n,j.[\bar{B}_{\mathrm{diag},k}]_{n,j}=\frac{\exp(a_{n}\delta_{n})-1}{a_{n}}\,[B_{k}^{(c)}]_{n,j}. (17)

In implementation, the numerator is evaluated with torch.expm1 to avoid catastrophic cancellation as an0a_{n}\to 0, with a small sign-preserving denominator offset used only for numerical stability.

The coupling factor EP(uk)E_{P}(u_{k}) is evaluated with a single dz×dzd_{z}\times d_{z} torch.linalg.matrix_exp call, yielding the discrete-time matrices

Adisc,k\displaystyle A_{\mathrm{disc},k} =EP(uk)ED,\displaystyle=E_{P}(u_{k})E_{D}, (18)
Bdisc,k\displaystyle B_{\mathrm{disc},k} =EP(uk)B¯diag,k.\displaystyle=E_{P}(u_{k})\,\bar{B}_{\mathrm{diag},k}.

Thus the diagonal ZOH step costs O(dz)O(d_{z}) and the coupling step costs O(dz3)O(d_{z}^{3}), rather than the augmented (dz+m)×(dz+m)(d_{z}+m)\times(d_{z}+m) matrix exponential required by a dense ZOH implementation that jointly recovers Adisc,kA_{\mathrm{disc},k} and Bdisc,kB_{\mathrm{disc},k}. For dz64d_{z}\leq 64, this overhead remains negligible relative to backbone inference.

C.4 Optimization

All models are trained with Adam [Kingma and Ba, 2015] for 401401 epochs at initial learning rate 1e31\mathrm{e}{-}3, weight decay 1e31\mathrm{e}{-}3, batch size 256256, and a step learning-rate schedule (step size 5050 epochs, γ=0.9\gamma=0.9). Gradient clipping is applied at 1.01.0. Best-validation-loss checkpoints are retained for evaluation. Due to compute constraints, results are reported from a single seed per configuration; per-cell dispersion is reported across the 10 closed-loop episodes and across the final 50 training epochs (mean50 in Table 2). Multi-seed validation is deferred to a follow-up. Training ran in FP32 throughout; mixed precision was not used.

The training loss is the multi-step observation-space prediction loss of Equation 10, with the spectral penalty λsstab\lambda_{s}\mathcal{L}_{\mathrm{stab}} disabled at inference.

C.5 Hardware and Wall-Clock

Training runs were executed on NVIDIA A100 GPUs on GCE a2-ultragpu-1g instances. Per-run training wall-clock was approximately 4.84.8 hours for CartPole and 9.759.75 hours for RSCP (bilinear Lie–Trotter models; corresponding linear baselines train roughly 6×6\times faster, at 0.7\sim\!0.7 hours per run on either system). Total compute budget for the experiments reported in this paper, including ablations and the lead-time sweep, was approximately 5858 GPU-hours (training: 3232 h across 88 runs; closed-loop MPC regen=never d-sweeps: 2626 h across 44 sweeps).

Appendix D Closed-Loop MPC Tasks

This appendix specifies the closed-loop MPC problems evaluated in Sections 5.2 and 5.5: cost functions, control bounds, warmup protocol, lead-time scheduling, and SCP hyperparameters.

D.1 Common Setup

For all systems we use prediction horizon HMPC=30H_{\mathrm{MPC}}=30, apply horizon 11 (standard receding-horizon MPC at d=0d=0), and 1010 episodes per configuration. Each episode is 1,0001{,}000 MPC steps. Following MamKO Eq. 10a, the receding-horizon objective at time kk is

x^j|k:=Cj|kz^j|k,\hat{x}_{j|k}:=C_{j|k}\hat{z}_{j|k},
Jk\displaystyle J_{k} =j=k+1k+H1(x^j|kxrefQ2+uj|kuj1|kR2)\displaystyle=\sum_{j=k+1}^{k+H-1}\Bigl(\|\hat{x}_{j|k}-x_{\mathrm{ref}}\|_{Q}^{2}+\|u_{j|k}-u_{j-1|k}\|_{R}^{2}\Bigr) (19)
+x^k+H|kxrefP2.\displaystyle\quad+\|\hat{x}_{k+H|k}-x_{\mathrm{ref}}\|_{P}^{2}.

that is, a quadratic tracking term on the predicted decoded state together with a Δu\Delta u smoothness penalty, plus a terminal tracking cost. The QP is built by dynamics elimination over the HMPCH_{\mathrm{MPC}}-step horizon and solved by OSQP [Stellato et al., 2020].

Warmup.

Following the MamKO evaluation protocol, the lookback buffer is initialized synthetically by repeating the simulator’s reset state HH times with zero control. MPC begins at simulator step t=0t=0; no real warmup rollout is performed. This convention matches MamKO’s published evaluation loop [Li et al., 2025] and is the operating point under which the forecasting and closed-loop numbers in the main text are reported.

D.2 CartPole TV

Reference: upright stabilization at xref=(0,0,0,0)x_{\mathrm{ref}}=(0,0,0,0) under sinusoidal cart friction (Section B.1). Stage and terminal cost weights are

Q\displaystyle Q =diag(1.0, 0.01, 100.0, 0.01),\displaystyle=\mathrm{diag}(1.0,\ 0.01,\ 100.0,\ 0.01),
R\displaystyle R =0.5,\displaystyle=0.5,
P\displaystyle P =diag(5000, 0, 0, 0),\displaystyle=\mathrm{diag}(5000,\ 0,\ 0,\ 0),

matching the MamKO reference configuration [Li et al., 2025]. The heavy weighting on θ\theta (state index 3) reflects the priority of pole-angle stabilization over cart-position regulation, and the terminal cost penalizes only cart position. The control bound is |F|20|F|\leq 20 N. There is no safety constraint; the trajectory is unconstrained over the action range.

D.3 RSCP (TI, TV)

Reference: setpoint tracking to the nominal steady state xsx_{s} of Section B.2. The cost weights heavily emphasize composition tracking over temperature tracking:

Q\displaystyle Q =diag(104, 104, 1, 104, 104, 1, 104, 104, 1),\displaystyle=\mathrm{diag}(10^{4},\ 10^{4},\ 1,\ 10^{4},\ 10^{4},\ 1,\ 10^{4},\ 10^{4},\ 1),
R\displaystyle R =5×1012I3,\displaystyle=5\times 10^{-12}\cdot I_{3},
P\displaystyle P =Q.\displaystyle=Q.

The asymmetric weighting follows the MamKO RSCP configuration: the relevant operational quantities are the species mass fractions, while the temperature states float to whatever the heat duties drive them to. The small RR value reflects the magnitude of the heat duties (u106u\sim 10^{6} in raw units): R=5×1012R=5\times 10^{-12} balances the Δu21012\|\Delta u\|^{2}\sim 10^{12} contribution against the xxs2O(1)\|x-x_{s}\|^{2}\sim O(1) tracking term. Control bounds are |QiQi,s|106|Q_{i}-Q_{i,s}|\leq 10^{6} kJ h-1 around the nominal duties; these are enforced as action-space clipping in the simulator after the QP returns its result in normalized control space (see Section D.4).

D.4 SCP Hyperparameters

The SCP controller of Algorithm 1 runs with initial trust-region radius ε0=1.0\varepsilon_{0}=1.0 (in normalized control space), shrinkage factor 1/21/2 on cost-increase steps, and maximum SCP iterations NSCP{1,5}N_{\mathrm{SCP}}\in\{1,5\} as reported in Section 5.2. The inner QP is solved by OSQP [Stellato et al., 2020] with default tolerances, warm-started from the previous receding-horizon call’s solution to amortize solver iterations across the closed-loop trajectory. The QP is built in normalized control space using the MamKO instance-normalization statistics output by the dynamics-generation network at each MPC step; control bounds and the reference state are normalized correspondingly. After the QP returns, the optimal control is denormalized and clipped to the simulator’s raw-space action bounds before application.

D.5 Lead-Time Sweep

The lead-time experiment of Section 5.2 sweeps the commitment window d{0,1,3,5}d\in\{0,1,3,5\} on both RSCP TV and CartPole TV with NSCP=5N_{\mathrm{SCP}}=5 and 1010 episodes per dd. The protocol is implemented by holding both the optimizer’s plan and the backbone’s dynamics output fixed during the lead window: at each MPC call the controller commits to the next dd controls from a queue, applies the next queued control to the simulator, and refills the queue from the freshly solved plan only after the queue empties; during this window the backbone is not re-evaluated, so the A¯k\bar{A}_{k}, B¯k\bar{B}_{k} matrices used for any auxiliary roll-forward are the ones generated at the start of the window. We refer to this as regen=never in our implementation; it isolates the open-loop correction capacity of the model class itself, with neither model receiving intermediate sensor or backbone updates within the lead window. The d=10d=10 point is generated but excluded from the analysis in the main text on grounds of discretization-saturation: at the RSCP sampling period of 1818 s, d=10d=10 corresponds to three minutes of stale plan, beyond the regime in which the underlying linearization is informative (cumulative costs converge for both models, see Table 7). The exact end-of-horizon values reported in Figures 2 and 3 are tabulated in Tables 7 and 8 for RSCP TV and CartPole TV, respectively.

Table 7: Cumulative log-cost at end of horizon, RSCP TV, lead-time sweep under regen=never with NSCP=5N_{\mathrm{SCP}}=5 and 1010 episodes per cell. Lower is better.
d=0d=0 d=1d=1 d=3d=3 d=5d=5
Linear 2.3233-2.3233 1.8717-1.8717 2.0559-2.0559 2.2037-2.2037
Bilinear-SCP-5 2.5442-2.5442 2.7764-2.7764 2.5662-2.5662 2.4252-2.4252
Table 8: Cumulative log-cost at end of horizon, CartPole TV, lead-time sweep under regen=never with NSCP=5N_{\mathrm{SCP}}=5 and 1010 episodes per cell. Lower is better.
d=0d=0 d=1d=1 d=3d=3 d=5d=5
Linear 1.9119-1.9119 1.8033-1.8033 1.6527-1.6527 1.4707-1.4707
Bilinear-SCP-5 1.9450-1.9450 1.9086-1.9086 1.7219-1.7219 1.4972-1.4972
Table 9: Learned bilinear coupling norm GF\|G\|_{F} at training convergence, per cell, with the qualitative shape of the training trajectory.
Cell GF\|G\|_{F} Trajectory shape
CartPole TI 0.360.36 Peak 0.480.48 at ep. 6060, decays
CartPole TV 0.570.57 Monotonic growth, recruitment
RSCP TI 0.760.76 Plateau, slack absorption
RSCP TV 0.440.44 Monotonic decay, multi-context

Appendix E Coupling Strength Diagnostic

Table 9 reports the learned GF\|G\|_{F} at training convergence across the four cells, alongside the qualitative shape of the training trajectory. The four cells exhibit distinct training trajectories that together provide an interpretive scaffold for the empirical results in the main text.

The CartPole TI trajectory rises early to GF0.48\|G\|_{F}\approx 0.48, then decays under spectral regularization, settling at 0.360.36. The bilinear capacity is genuinely used—consistent with the strict forecasting gain on this cell—but constrained from over-recruitment.

CartPole TV grows monotonically to 0.570.57. The progressive recruitment matches the time-varying nature of the perturbed dynamics: as training proceeds, the model leans on GG to absorb friction-coefficient drift that the linear lift cannot track within a single MPC horizon.

RSCP TI plateaus high at 0.760.76, but the closed-loop and forecast results show no clear advantage of the bilinear model on this cell. We interpret this as the optimizer absorbing residual training slack into GG, which has no physical target on this cell (uxu\cdot x is structurally absent in the ODE). GF\|G\|_{F} alone should not be read as evidence of bilinear strength: its size on RSCP TI is set by training-time slack, not by physics.

RSCP TV decays monotonically from an early peak to 0.440.44. Under time-varying parameters, GG must serve simultaneously across the family of operating regimes induced by the modulation; the optimizer responds by shrinking GG toward a setting that works on average rather than overfitting to any single regime. This is consistent with Section 5.2: Bilinear-SCP-55, which iteratively re-linearizes the operator at the current operating point, outperforms Bilinear-SCP-11 on this cell because the average GG is what the model carries while iterative SCP is what extracts operating-point-specific use of it.

The four trajectories provide an interpretive scaffold rather than a predictive theory: GF\|G\|_{F} is the empirical witness for the structural condition in Proposition 2 (CartPole TI), the recruitment indicator under TV physics (CartPole TV, RSCP TV), and the slack-absorption diagnostic where no physical target exists (RSCP TI). The proposition does not predict trajectory shape; the trajectories make sense in light of it.

Comments

· 0
Be the first to comment on this paper.