[2605.04793] Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Matan Pagi Zohar Sorek Affiliation: Psistar AI Affiliation: {matan,zohar}@psistar.ai

Abstract

Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution.

We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than $1\%$ parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone- descent and KKT convergence results under standard trust-region assumptions.

Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provides a simple and effective mechanism for robust MPC under varying conditions.

1 Introduction

The Koopman operator offers an elegant reformulation of nonlinear dynamics: any nonlinear system, however complex in state space, admits a linear representation in an infinite-dimensional function space (Koopman, 1931; Mezić, 2005). Finite-dimensional approximations learned from trajectory data have enabled controllers that combine the tractability of linear MPC with operation on genuinely nonlinear physical systems (Korda and Mezić, 2018; Lusch et al., 2018; Williams et al., 2015). For real-time industrial control-chemical reactors, compressors, power converters-tractable convex optimization at each control step is a hard operational requirement.

Recent work has extended Koopman learning to time-varying systems by replacing fixed operator matrices with sequences generated from historical data. A representative instance of this class is the Mamba-based Koopman operator (MamKO) (Li et al., 2025), which processes recent state and control history through a convolutional-FCNN network to emit the matrices $\bar{A}_{k}$ , $\bar{B}_{k}$ , $\bar{C}_{k}$ of a local linear system at each step, substantially outperforming static Koopman models while retaining the convex MPC formulation.

The structural constraint.

This class of methods rests on a deliberate architectural choice: the drift operator $\bar{A}_{k}$ is generated from historical data $X_{\mathrm{hist}}=\{z_{k-H:k-1},u_{k-H:k-1}\}$ but is explicitly not a function of the current control input $u_{k}$ . Making $\bar{A}_{k}$ depend on $u_{k}$ would introduce a bilinear term $\bar{A}_{k}(u_{k})\,z_{k}$ , breaking MPC convexity. This constraint is acknowledged in the MamKO formulation (Li et al., 2025) and is forced more generally by any control architecture that preserves single-shot QP convexity. The consequence is conditional independence $\bar{A}_{k}\perp\!\!\!\perp u_{k}\mid X_{\mathrm{hist}}$ . This constraint limits the model’s expressive power along two distinct axes.

Control–state coupling. When the governing physics contains $u\cdot x$ terms-convective fluxes in flow-actuated reactors, multiplicative inputs in mechanical systems, voltage-dependent impedances in power electronics (Mohler, 1973), the mixed partial $\partial^{2}z_{k+1}/\partial z_{k}\partial u_{k}^{(i)}$ is structurally nonzero. MamKO’s architecture forces this quantity to zero, requiring its backbone to absorb the coupling indirectly via step-by-step regeneration of $\bar{A}_{k}$ . Over a multi-step rollout, the resulting approximation error accumulates fastest in regimes where the controller is deliberately exercising large control variation-precisely where MPC matters most.

Time-varying parameters. When the dynamics themselves drift over time—catalyst aging, heat-exchanger fouling, mechanical wear, friction modulation—the effective operator must track changes that the historical encoder may not yet have observed within its lookback window. A bilinear coupling provides an additional, structured channel through which the operator can adapt to the present operating point in a way that the linear lift cannot reach within a single MPC horizon.

Both regimes are pervasive in industrial control: the first is structural, the second operational. A single architectural mechanism that addresses both-without requiring a different model class for each-is the contribution of this paper.

Our contribution.

We propose a minimal, principled extension of the Mamba-Koopman dynamics that relaxes conditional independence while retaining the features that make the model useful for control:

z_{k+1}=\underbrace{\Bigl(\mathrm{diag}(A)+\sum_{i=1}^{m}u_{k}^{(i)}\,G_{i}\Bigr)}_{=:\,A_{\mathrm{eff}}(u_{k})}z_{k}+B_{k}u_{k},

(1)

where $A\in\mathbb{R}^{d_{z}}$ contains the diagonal continuous-time eigenvalues of the existing Mamba–Koopman backbone (unchanged), $G_{i}\in\mathbb{R}^{d_{z}\times d_{z}}$ are learned bilinear coupling tensors and $B_{k}\in\mathbb{R}^{d_{z}\times m}$ is the step-varying actuator matrix from the existing dynamics network. Setting $G_{i}=0$ restores MamKO exactly. The bilinear structure exposes exact analytical model Jacobians, making Sequential Convex Programming (SCP) natural: linearize around a nominal trajectory, solve a convex QP, update, repeat.

Summary of contributions.

1.

Bilinear Koopman dynamics (Section 3): a strict generalization of the conditional-independence class modeling control–state coupling, with low-rank parameterization for parameter efficiency and a spectral penalty enforcing stability of the resulting dense operator.
2.

SCP controller (Section 4): an iterative MPC algorithm exploiting exact model Jacobians, with monotone-descent and KKT convergence results under standard trust-region assumptions.
3.

Empirical evaluation (Section 5) across CartPole and RSCP in time-invariant and time-varying regimes, establishing four consistent effects: forecasting non-inferiority on every cell, training stabilization on RSCP TV, closed-loop MPC wins on RSCP TV when SCP is iterated, and graceful degradation under stale-plan execution.

Scope and limitations.

Our method inherits the conditional-independence baseline’s assumptions: the latent dynamics are approximately linear at each step, and the encoder is expressive enough to find a lifting where this holds. The SCP controller provides local (KKT) rather than global optimality guarantees, and convergence rate depends on initialization quality. We position this work as closing a specific, well-characterized structural gap in the Koopman-for-control literature without claiming a new operator class.

2 Background

2.1 Koopman Operators for Controlled Systems

For a discrete-time controlled system $x_{k+1}=f(x_{k},u_{k})$ , the Koopman operator (Koopman, 1931) acts on observables $g$ via composition: $\mathcal{K}g=g\circ f$ . This operator is linear even when $f$ is not. One seeks a finite-dimensional lifting $\psi:\mathbb{R}^{n}\to\mathbb{R}^{d_{z}}$ such that the lifted state $z_{k}=\psi(x_{k})$ evolves approximately linearly (Korda and Mezić, 2018):

z_{k+1}=Az_{k}+Bu_{k},\quad\hat{x}_{k}=Cz_{k}.

(2)

EDMD (Williams et al., 2015) identifies $A,B,C$ by least squares over a fixed observable dictionary. Deep Koopman methods replace the fixed dictionary with a learned encoder (Lusch et al., 2018).

2.2 Mamba-Koopman and Conditional Independence

MamKO (Li et al., 2025) addresses the limitation of a fixed operator for time-varying systems. Inspired by Mamba’s selective state-space mechanism (Gu and Dao, 2023), a matrices-generation network takes the historical sequence $X_{\mathrm{hist}}=[z_{k-H:k-1}^{\top},u_{k-H:k-1}^{\top}]^{\top}$ and emits time-varying operators via 1D convolution and FCNNs. After ZOH discretization:

z_{k+1}=\bar{A}_{k}(X_{\mathrm{hist}})\,z_{k}+\bar{B}_{k}(X_{\mathrm{hist}})\,u_{k},\quad\hat{x}_{k}=\bar{C}_{k}z_{k}.

(3)

Generating all matrices from $X_{\mathrm{hist}}$ alone enforces

\bar{A}_{k}\perp\!\!\!\perp u_{k}\;\Big|\;X_{\mathrm{hist}},

(4)

preserving linearity in $u_{k}$ and hence MPC convexity, but preventing control-state coupling.

2.3 Bilinear Systems

A discrete-time bilinear system takes the form $x_{k+1}=Ax_{k}+\sum_{i}u_{k}^{(i)}N_{i}x_{k}+Bu_{k}$ , where $N_{i}\in\mathbb{R}^{n\times n}$ are interaction matrices. Bilinear systems occupy a well-studied middle ground between linear and fully nonlinear: they represent a broad class of physical phenomena including chemical reactions and mechanical systems with multiplicative inputs (Mohler, 1973). The Koopman literature has identified bilinear lifted models and related finite-dimensional approximations as a useful middle ground for control, making this a natural target for Koopman approximation. SCP (Mao et al., 2018) and iterative LQR (Li and Todorov, 2004) handle optimization over bilinear dynamics by exploiting the analytic Jacobian structure.

3 Bilinear Koopman Extension

3.1 Bilinear Latent Dynamics

We relax eq. 4 by parameterizing the latent transition as eq. 1, where $A\in\mathbb{R}^{d_{z}}$ , $G_{i}\in\mathbb{R}^{d_{z}\times d_{z}}$ , and $B_{k}$ are as described in Section 1. The observation map $\hat{x}_{k}=Cz_{k}$ remains linear. The effective drift operator $A_{\mathrm{eff}}(u_{k})=\mathrm{diag}(A)+\sum_{i}u_{k}^{(i)}G_{i}$ is a linear function of $u_{k}$ , making the full dynamics bilinear in $(z_{k},u_{k})$ jointly.

Proposition 1 (Strict Generalization).

The function class of eq. 1 strictly contains that of eq. 3.

Proof.

Setting $G_{i}=0$ for all $i$ recovers eq. 3 exactly. Conversely, any system with $\partial^{2}z_{k+1}/(\partial z_{k}\,\partial u_{k}^{(i)})=G_{i}\neq 0$ is representable by eq. 1 but not by eq. 3, since eq. 4 forces this cross-derivative to zero. Hence the containment is strict. ∎

Proposition 2 (Population-Level Forecasting Non-Inferiority).

Let $\mathcal{F}_{\mathrm{MamKO}}\subsetneq\mathcal{F}_{\mathrm{BL}}$ denote the hypothesis classes induced by eq. 3 and eq. 1. Then

\min_{f\in\mathcal{F}_{\mathrm{BL}}}\,\mathbb{E}\bigl[\|z_{k+1}-f(z_{k},u_{k})\|^{2}\bigr]\\ \leq\;\min_{f\in\mathcal{F}_{\mathrm{MamKO}}}\,\mathbb{E}\bigl[\|z_{k+1}-f(z_{k},u_{k})\|^{2}\bigr],

with strict inequality whenever the true latent dynamics admit nonzero control–state coupling.

Proof.

The inequality follows from Proposition 1: any optimum in $\mathcal{F}_{\mathrm{MamKO}}$ is achievable in $\mathcal{F}_{\mathrm{BL}}$ by setting $G_{i}=0$ . For strictness, suppose the true latent dynamics satisfy $z_{k+1}=A^{\star}(u_{k})\,z_{k}+B^{\star}u_{k}$ with $\partial A^{\star}/\partial u_{k}^{(i)}=G_{i}^{\star}\neq 0$ . Any $f\in\mathcal{F}_{\mathrm{MamKO}}$ incurs irreducible error proportional to $\|G_{i}^{\star}\|\cdot\|u_{k}-\bar{u}\|$ , where $\bar{u}$ is the training-set mean control. This error vanishes only when $u_{k}$ is constant across the evaluation distribution-vacuous in any MPC setting. ∎

Remark 1.

Empirically the strictness condition partitions the systems we evaluate. On CartPole, where the equations of motion couple horizontal force with pendulum geometry, the bilinear extension produces strict forecasting gains on TI and TV. On RSCP, where heat duties enter additively and the governing ODE contains no $u\cdot x$ terms, $\|G_{i}^{\star}\|\approx 0$ and the proposition predicts equality—which is what we observe on RSCP TI, and on RSCP TV under deployment-relevant averaging (Table 2). The $23\%$ bilinear-vs-linear gap visible in best-checkpoint MSE on RSCP TV reflects a training-noise asymmetry between the two models, not a violation of the proposition: the Linear baseline oscillates throughout training while bilinear converges smoothly, so single-epoch selection from linear’s trajectory captures one of its downward spikes. Mean-last- $50$ MSE—which averages across the spikes—restores the equality the proposition predicts. The learned $\|G\|_{F}$ values (Table 9) empirically confirm the partition.

3.2 Low-Rank Parameterization

The tensors $\{G_{i}\}_{i=1}^{m}$ add $m\cdot d_{z}^{2}$ parameters. For typical settings ( $m\leq 5$ , $d_{z}\leq 64$ ), this is at most ${\sim}20$ K parameters, under 1% of a standard Mamba-Koopman backbone. We impose a low-rank prior:

G_{i}=L_{i}R_{i}^{\top},\quad L_{i},R_{i}\in\mathbb{R}^{d_{z}\times r},\quad r\ll d_{z},

(5)

reducing the addition to $2mrd_{z}$ parameters and regularizing the coupling strength when $r<d_{z}$ . Both $L_{i}$ and $R_{i}$ are initialized to zero, so the model begins as an exact copy of the Linear baseline; any divergence in training is attributable solely to the bilinear terms. The rank $r$ is a single interpretable hyperparameter: $r=1$ encodes a rank- $1$ perturbation per control channel; $r=d_{z}$ recovers the unconstrained bilinear model. We use $r=d_{z}$ in all reported runs, as $d_{z}\in\{8,15\}$ is small enough that further parameter reduction is unnecessary; the low-rank option remains available for deployments with larger latent dimensions.

3.3 Lie–Trotter Split ZOH Discretization

Write the effective generator as

$\displaystyle A_{\mathrm{eff}}(u_{k})$	$\displaystyle=D+P(u_{k}),$	(6)
$\displaystyle D$	$\displaystyle=\mathrm{diag}(A),$
$\displaystyle P(u_{k})$	$\displaystyle=\sum_{j=1}^{m}u_{k}^{(j)}G_{j}.$

Dense ZOH would exponentiate $D+P(u_{k})$ at a single scalar period $T$ , collapsing the baseline MamKO per-mode timescales $\delta_{n}$ . This would break the exact zero-coupling reduction to the linear baseline. We therefore use a first-order Lie–Trotter splitting (McLachlan and Quispel, 2002), historically rooted in the Trotter product formula (Trotter, 1959):

	$\displaystyle\exp\!\left((D+P(u_{k}))T\right)$	$\displaystyle\approx\underbrace{\exp\!\left(P(u_{k})T\right)}_{=:E_{P}(u_{k})}$		(7)
		$\displaystyle\qquad\;\underbrace{\mathrm{diag}\!\left(\exp(A\odot\boldsymbol{\delta})\right)}_{=:E_{D}}.$		(7)

With $\bar{B}_{\mathrm{diag},k}$ the corresponding diagonal ZOH input integral, the implemented one-step map is

F_{\mathrm{LT}}(z_{k},u_{k}):=E_{P}(u_{k})\bigl(E_{D}z_{k}+\bar{B}_{\mathrm{diag},k}u_{k}\bigr).

(8)

When $P(u_{k})\equiv 0$ , $E_{P}(u_{k})=I$ , so the scheme reduces exactly to the baseline per-mode diagonal ZOH. Implementation details, numerical stabilization, and cost are deferred to Section C.3.

3.4 Stability

The negative_celu activation in the Mamba-Koopman backbone constrains the diagonal entries of $A$ to be at most 1, guaranteeing $\rho(A_{\mathrm{disc}})\leq 1$ for the diagonal case. This guarantee does not extend to the dense $A_{\mathrm{eff}}(u_{k})$ : the bilinear terms can shift eigenvalues outside the unit disk. We apply a two-tier approach at training and inference.

Training: spectral penalty.

We add to the loss

\mathcal{L}_{\mathrm{stab}}=\sum_{j}\mathrm{ReLU}\bigl(|\lambda_{j}(A_{\mathrm{disc},k})|-1+m_{s}\bigr),

(9)

with margin $m_{s}=0.05$ , computed via torch.linalg.eigvals in float32 to avoid mixed-precision artifacts. Zero initialization of $G_{i}$ ensures the model starts in the stable regime.

Inference.

At inference we evaluate $\rho(A_{\mathrm{disc},k})$ at each MPC step via torch.linalg.eigvals, at $O(d_{z}^{3})$ cost; for $d_{z}\leq 64$ this is negligible relative to backbone inference. A discussion of why we did not use a Gershgorin pre-check for screening is deferred to Appendix A.

3.5 Training Objective

We train end-to-end on multi-step observation-space prediction loss:

\mathcal{L}=\frac{1}{T}\sum_{k=1}^{T}\|\hat{x}_{k}-x_{k}\|^{2}+\lambda_{s}\,\mathcal{L}_{\mathrm{stab}},

(10)

with the stability penalty disabled at inference. All baseline training hyperparameters (optimizer, schedule, context window $H$ ) are held fixed; only $\{L_{i},R_{i}\}$ and the jointly trained backbone parameters are updated.

4 SCP Controller for Bilinear MPC

4.1 Problem Formulation

The open-loop MPC problem over horizon $H$ is

$\displaystyle\min_{u_{0:H-1}}$	$\displaystyle\sum_{k=0}^{H-1}\ell(z_{k},u_{k})$	(11)
s.t.	$\displaystyle z_{k+1}=F_{\mathrm{LT}}(z_{k},u_{k}),$
	$\displaystyle u_{k}\in\mathcal{U}.$

This problem is non-convex in $(z_{k},u_{k})$ jointly. The conditional-independence baseline sidesteps non-convexity by design (conditional independence makes the QP trivially convex in one shot). We instead solve eq. 11 iteratively via SCP, exploiting the exact differentiable structure of the implemented bilinear model.

4.2 Exact Model Jacobians

SCP linearizes the implemented Lie–Trotter map eq. 8 around a nominal trajectory $(\bar{z}_{0:H},\bar{u}_{0:H})$ . Since $F_{\mathrm{LT}}(z_{k},u_{k})$ is affine in $z_{k}$ , the state Jacobian is

	$\displaystyle\tilde{A}_{k}$	$\displaystyle=\frac{\partial F_{\mathrm{LT}}}{\partial z_{k}}\Bigg\|_{(\bar{z}_{k},\bar{u}_{k})}=E_{P}(\bar{u}_{k})E_{D},$		(12)
	$\displaystyle\tilde{B}_{k}$	$\displaystyle=\frac{\partial F_{\mathrm{LT}}}{\partial u_{k}}\Bigg\|_{(\bar{z}_{k},\bar{u}_{k})}.$		(13)

The input Jacobian $\tilde{B}_{k}$ contains the derivative of the matrix exponential $E_{P}(u_{k})$ with respect to $u_{k}$ . In implementation, we compute $\tilde{B}_{k}$ by reverse-mode automatic differentiation of the full Lie–Trotter update using torch.func.jacrev. This yields exact model Jacobians of the implemented differentiable dynamics up to floating-point precision, with no finite-difference linearization inside SCP.

4.3 SCP Algorithm

Each SCP iteration solves the convex subproblem obtained by evaluating the finite-horizon quadratic tracking objective on the linearized trajectory. Denoting this objective by $J_{\mathrm{lin}}$ , we solve

$\displaystyle\min_{\delta u_{0:H-1}}$	$\displaystyle J_{\mathrm{lin}}(\delta u;\bar{z},\bar{u})$	(14)
s.t.	$\displaystyle\delta z_{0}=0,$
	$\displaystyle\delta z_{k+1}=\tilde{A}_{k}\delta z_{k}+\tilde{B}_{k}\delta u_{k},$
	$\displaystyle\bar{u}_{k}+\delta u_{k}\in\mathcal{U},$
	$\displaystyle\\|\delta u_{k}\\|_{\infty}\leq\varepsilon.$

Here $J_{\mathrm{lin}}$ includes the same stage and terminal terms as the true MPC objective; when a $\Delta u$ penalty is used, the previous applied input is treated as fixed data, so the subproblem remains a convex QP in $\delta u_{0:H-1}$ . Eliminating the linearized dynamics yields a dense box-constrained QP solved via OSQP (Stellato et al., 2020) with warm-starting between iterations.

Algorithm 1 SCP-Bilinear MPC

1:Initial state

z_{0}

, nominal controls

\bar{u}_{0:H-1}

, initial trust-region radius

\varepsilon_{0}

, model params

2:Optimized control sequence

u_{0:H-1}^{\star}

\varepsilon\leftarrow\varepsilon_{0}

4:Roll out

\bar{z}_{0:H}

under eq. 8

5:for

n=1,\ldots,N_{\mathrm{SCP}}

6: Compute

\tilde{A}_{k}

\tilde{B}_{k}

via eqs. 12 to 13

7: Eliminate

\delta z

; form dense QP in

\delta u_{0:H-1}

8: Solve QP via OSQP (warm-started)

9: if

J(\bar{u}+\delta u)\leq J(\bar{u})

then

10:

\bar{u}\leftarrow\bar{u}+\delta u

; roll out

\bar{z}_{0:H}

11: else

12:

\varepsilon\leftarrow\varepsilon/2

13: end if

14:end for

15:return

\bar{u}_{0:H-1}

Proposition 3 (SCP Monotone Descent and KKT Convergence).

Consider the bilinear MPC problem eq. 11 with a continuously differentiable finite-horizon objective $J$ whose stage and terminal terms are convex quadratic in the variables of each SCP subproblem (including, when used, control-increment penalties with the previous input treated as fixed problem data), and with box constraints $\mathcal{U}=\{u:u_{\min}\leq u\leq u_{\max}\}$ . Suppose the SCP iterates $\{(\bar{u}^{(n)},\bar{z}^{(n)})\}$ remain in a bounded set, the implemented dynamics map $F_{\mathrm{LT}}$ of eq. 8 has locally Lipschitz Jacobian on a neighborhood of that set, and each convex subproblem includes both the feasibility constraint $\bar{u}_{k}+\delta u_{k}\in\mathcal{U}$ and the trust-region constraint $\|\delta u_{k}\|_{\infty}\leq\varepsilon$ . Then:

(i) Monotone descent. Let $\Delta^{(n)}>0$ denote the predicted QP cost decrease at iteration $n$ . Because $F_{\mathrm{LT}}$ has locally Lipschitz Jacobian, the first-order model error on a trust region of radius $\varepsilon$ is $O(\varepsilon^{2})$ uniformly on bounded sets. Hence the discrepancy between the linearized and true objectives over a finite horizon satisfies

|J_{\mathrm{lin}}-J_{\mathrm{true}}|=O(\varepsilon^{2}).

(15)

When $\varepsilon$ is chosen so that this approximation error is less than $\Delta^{(n)}$ , the accepted step is truly descent, so the sequence $\{J(\bar{u}^{(n)})\}$ is non-increasing. The trust-region shrinkage $\varepsilon\leftarrow\varepsilon/2$ on failed steps guarantees this condition is eventually met because the right-hand side of eq. 15 vanishes as $\varepsilon\to 0$ .

(ii) KKT convergence. If the standard regularity assumptions for trust-region SCP hold at an accumulation point—in particular a constraint qualification for the active box constraints—then any accumulation point $(\bar{u}^{\star},\bar{z}^{\star})$ of the accepted iterates is a KKT point of eq. 11. Equivalently, the predicted QP decrease vanishes only at first-order stationary points of the original nonlinear MPC problem, and the associated multipliers satisfy the KKT conditions by the standard trust-region SCP argument of Mao et al. (2018, Theorem 2).

Proof.

By construction, the matrices $\tilde{A}_{k},\tilde{B}_{k}$ in eqs. 12 to 13 are the exact Jacobians of the implemented Lie–Trotter map $F_{\mathrm{LT}}$ , computed without finite differences.

For part (i), local Lipschitz continuity of the Jacobian of $F_{\mathrm{LT}}$ yields a quadratic first-order remainder on each trust region. Over a finite horizon, boundedness of the iterates implies a uniform constant in the bound, so $|J_{\mathrm{lin}}-J_{\mathrm{true}}|=O(\varepsilon^{2})$ on the trust region. Hence sufficiently small trust regions guarantee that the true cost decrease matches the predicted decrease up to higher-order error, which yields monotone descent after finitely many shrinkage steps.

For part (ii), feasibility is preserved because each QP enforces $\bar{u}_{k}+\delta u_{k}\in\mathcal{U}$ . Once the predicted decrease tends to zero, the accepted iterates satisfy the first-order necessary conditions of the nonlinear MPC problem under the standard trust-region SCP regularity assumptions. The cited result of Mao et al. (2018, Theorem 2) then implies that every accumulation point of the accepted iterates is a KKT point of eq. 11. ∎

Remark 2.

The boundedness assumption on iterates is retained as an explicit hypothesis. In practice it is promoted by compact control constraints together with the spectral regularization and runtime stability checks of Section 3.4, but those mechanisms do not by themselves constitute a formal proof that $\rho(A_{\mathrm{eff}}(u))<1$ uniformly for all $u\in\mathcal{U}$ .

Corollary 3.1.

The conditional-independence baseline eq. 4 solves a single QP per MPC step but cannot represent control–state coupling. The proposed method solves $N_{\mathrm{SCP}}\in\{1,5\}$ QPs on the implemented bilinear model. Both are polynomial-time per iteration; the additional QPs incur negligible wall-clock cost relative to the modeling gain (Table 3).

5 Experiments

5.1 Setup

Systems.

We evaluate on two benchmarks across time-invariant and time-varying regimes, yielding four cells.

CartPole. Standard underactuated cart-pole swing-up; $4$ states $(x,\dot{x},\theta,\dot{\theta})$ , $1$ control (horizontal force). The horizontal force enters the equations of motion through products with trigonometric functions of $\theta$ , making the system control-affine with state-dependent input gain $g(x)$ . The Koopman generator of such a system acts bilinearly on observables in the control input (Goswami and Paley, 2020), so any sufficiently expressive lifting that captures $\cos\theta$ , $\sin\theta$ , and $\dot{\theta}^{2}$ inherits a $u\cdot z$ coupling that MamKO’s conditional-independence constraint excludes. Whether the learned encoder spans these coordinates well enough for the gap to be empirically detectable is one of the questions our forecasting experiments answer (Table 2). The time-invariant variant (TI) uses constant cart friction; the time-varying variant (TV) modulates the friction coefficient as $\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+A_{\mu}\sin(\omega_{\mu}t)$ .

RSCP. The reactor–separator process of Liu et al. (2008), adopted as the MamKO benchmark in Li et al. (2025). Two CSTRs in series followed by a flash separator with recycle; $9$ states and $3$ controls (heat duties $Q_{1},Q_{2},Q_{3}$ ). The heat duties enter the energy balances additively as $+Q_{i}/(\rho c_{p}V_{i})$ : the governing ODE contains no $u\cdot x$ terms, only nonlinearity in state. The TV variant modulates feed composition sinusoidally, introducing time-varying parameters without adding control–state coupling. RSCP exercises the second axis of Section 1 (time-varying parameters) without the first (control coupling), making the two benchmarks complementary tests of the bilinear extension.

Baselines.

Our primary structural ablation is the conditional-independence baseline, which is equivalent to our model with $G_{i}=0$ . We instantiate this baseline using the MamKO architecture (Li et al., 2025), which is representative of the class. Throughout, “Linear” refers to this $G_{i}=0$ baseline. MamKO itself was evaluated against the Deep Koopman Operator (DKO) in the original work and established as the stronger time-varying Koopman baseline; we inherit this comparison by transitivity. Our method is denoted Bilinear- $r$ ; we use $r=d_{z}$ (full rank) in all reported runs, giving $r=8$ on CartPole and $r=15$ on RSCP (Table 6). Closed-loop variants are Bilinear-SCP- $N$ for $N\in\{1,5\}$ SCP iterations.

Protocol.

We generate $5{,}000$ training, $1{,}000$ validation, and $1{,}000$ test trajectories per system using mixed sinusoidal and step-function control excitation. All Koopman-based methods use the same latent dimension $d_{z}$ , prediction horizon, and training budget; dataset splits and random seeds are held constant. Forecasting is reported as 30-step open-loop MSE on held-out test trajectories. Closed-loop MPC is evaluated over $10$ episodes per cell under a setpoint-tracking task; we report cumulative running-average cost as a function of time, with shaded $0.3\sigma$ bands¹¹1Bands are reported at $0.3\sigma$ for visual readability of model separation; the conclusions are unchanged at $1\sigma$ , though some panels then exhibit fully overlapping bands. We report the band width explicitly so readers can interpret the figures under the variance scale of their preference. for visual comparison.

Reproducibility against published values.

Our re-trained Linear baseline reproduces the forecast MSE of Li et al. (2025, Table 1) within $\sim 3\%$ on CartPole TI and within $0.78\times$ on RSCP TI. On CartPole TV our Linear baseline trains to MSE $7.8\times 10^{-3}$ versus the published $4.3\times 10^{-4}$ , a gap we have not closed under matched hyperparameters; on RSCP TV our Linear baseline trains to $7.1\times 10^{-4}$ versus the published $9.7\times 10^{-3}$ , a $13\times$ improvement that we attribute to the training-stability mechanism documented in Section 5.4. Bilinear-vs-Linear comparisons throughout this paper are reported against our own re-trained baseline; reviewers comparing absolute numbers should reference this calibration.

5.2 Time-Varying Capture: SCP Iteration and Lead-Time Robustness

The headline cell of this paper is RSCP TV: a chemical reactor benchmark with sinusoidally modulated feed composition, no $u\cdot x$ coupling in its governing ODE, and time-varying parameters that the linear lift can only track through history-driven re-generation of the operator at each MPC step. Two experiments converge on a single finding: under TV physics, the bilinear coupling provides operator-level correction that the linear lift architecturally cannot reach within a single MPC horizon, but unlocking it requires either iterative re-linearization at solve time or commitment to a control sequence whose effect on the operator is structurally encoded.

SCP iteration unlocks TV capture under standard MPC.

Figure 1 reports cumulative running-average closed-loop cost for Linear (single QP), Bilinear-SCP- $1$ (one SCP iteration on the bilinear model—equivalent to LQR-style single linearization), and Bilinear-SCP- $5$ (five iterations) across the four cells. On RSCP TV (bottom right), Bilinear-SCP- $5$ separates from both other controllers at $t\approx 0.4$ h and the gap widens monotonically through end of horizon. Bilinear-SCP- $1$ , by contrast, does not distinguish itself from Linear and trails it slightly. A single linearization of the bilinear model is insufficient to exploit the additional capacity under TV physics: the SCP iterations are what unlock the operator-level corrections the bilinear coupling makes available, by repeatedly re-linearizing $A_{\mathrm{eff}}(u_{k})=\mathrm{diag}(A)+\sum_{i}u_{k}^{(i)}G_{i}$ at the current operating point as the iterates converge.

Table 1 reports per-cell closed-loop tracking cost for Linear and our Bilinear-SCP-5 controller over the matched evaluation window of Li et al. (2025, Fig. 3). Bilinear-SCP-5 wins on three of four cells, with the largest gap on RSCP TV ( $3.44$ vs $4.91\times 10^{-3}$ , a $30\%$ reduction). The single loss is a $5\%$ gap on CartPole TI; this regime carries no $u\cdot x$ coupling, no time-varying parameters, and benign dynamics, so the additional SCP overhead does not compound into a benefit.

Table 1: Closed-loop MPC cumulative cost over 10 episodes per cell, on the matched evaluation window of Li et al. (2025, Fig. 3). Bold: best per cell.

Cell	Linear	Bilinear-SCP-5
CartPole TI (20 s)	$\mathbf{2.40{\times}10^{-2}}$	$2.54{\times}10^{-2}$
CartPole TV (20 s)	$1.79{\times}10^{-2}$	$\mathbf{1.73{\times}10^{-2}}$
RSCP TI (2 h)	$7.90{\times}10^{-4}$	$\mathbf{6.43{\times}10^{-4}}$
RSCP TV (2 h)	$4.91{\times}10^{-3}$	$\mathbf{3.44{\times}10^{-3}}$

We emphasize that Bilinear-SCP- $1$ is not a stripped-down ablation but a realistic deployment baseline: it is what one obtains from running the bilinear model under a budget-constrained MPC stack (e.g., compute-limited edge controllers). The separation between SCP- $1$ and SCP- $5$ on RSCP TV is the empirical cost of single linearization in the TV regime.

Refer to caption — Figure 1: Closed-loop MPC cost over $10$ episodes per cell. Bilinear-SCP- $5$ separates from both other controllers on RSCP TV (bottom right) starting around $t\approx 0.4$ h, with the gap widening monotonically through end of horizon. Other cells discussed in Section 5.5.

Lead-time commitment unlocks the same correction with a frozen backbone.

In deployment, MPC controllers are not always free to re-plan at every control step: compute budgets, network round-trip delays, and asynchronous sensor updates force the controller to commit to a previously computed plan for some lead time before re-planning. We probe this regime via a lead-time experiment: at each MPC step the controller commits to its plan for $d$ subsequent steps before re-planning, with $d\in\{0,1,3,5\}$ . The $d=0$ case recovers standard receding-horizon MPC. Crucially, we hold both the original plan and the backbone dynamics fixed during the lead window (regen=never), so neither model receives intermediate corrections from new sensor data. This isolates the open-loop correction capacity of the model class itself: the linear lift’s drift $\bar{A}_{k}$ is fixed across the entire lead window because it depends only on history; the bilinear model’s effective drift $A_{\mathrm{eff}}(u_{k})$ varies with the committed control $u_{k}$ along the lead window, providing operator-level correction even with the backbone frozen. We exclude $d=10$ from analysis: at over three minutes of stale plan on RSCP, both models saturate from discretization-induced error rather than model-class differences.

Figure 2 shows the result on RSCP TV. The bilinear model sits below Linear at every commitment window (Table 7), with the gap widening sharply once planning becomes stale. At $d=0$ the bilinear advantage is modest ( $-2.54$ vs $-2.32$ , a gap of $0.22$ in log-cost). At $d=1$ , Linear degrades sharply—its cumulative log-cost plateaus near $-1.87$ by end of horizon, while the bilinear model reaches $-2.78$ , a gap of nearly $1$ in log-cost. The pattern persists at $d=3$ (Linear plateau $-2.06$ , bilinear $-2.57$ ) and shrinks at $d=5$ where both models saturate but bilinear still leads by $\sim 0.22$ in log-cost. Figure 3 reports the same experiment on CartPole TV. The bilinear advantage is small throughout and roughly constant in $d$ (Table 8); the underlying dynamics are benign enough that plan staleness does not catastrophically degrade either model.

One mechanism, two regimes.

Both findings identify the same mechanism. The bilinear coupling encodes how the drift varies with the control, and this variation is exploitable through any control regime that exercises the dependency: iterated SCP exercises it via re-linearization at the current operating point; lead-time commitment exercises it via rollout of the committed control sequence with the backbone frozen. The Linear baseline, lacking the coupling, has no analogous mechanism: its only access to time-varying physics is backbone re-generation, which fails under freeze and is insufficient under standard MPC. This regime is operationally relevant: control deployments commonly enforce re-planning at intervals longer than the underlying simulator step due to compute, communication, or scheduling constraints.

5.3 Forecasting Non-Inferiority

Table 2: Forecast MSE on held-out trajectories. Best-checkpoint (single minimum-validation-loss epoch, the convention of Li et al. (2025, Table 1)) and mean over the final

50

training epochs (deployment-relevant, robust to selection from noisy training curves). Bold: best per cell per metric.

Model	Metric	CP TI	CP TV	RSCP TI	RSCP TV
Linear	best	$6.31{\times}10^{-4}$	$7.80{\times}10^{-3}$	$\mathbf{2.28{\times}10^{-3}}$	$\mathbf{7.14{\times}10^{-4}}$
Linear	mean₅₀	$6.67{\times}10^{-4}$	$8.01{\times}10^{-3}$	$2.36{\times}10^{-3}$	$9.09{\times}10^{-4}$
Bilinear	best	$\mathbf{4.45{\times}10^{-4}}$	$\mathbf{7.43{\times}10^{-3}}$	$2.28{\times}10^{-3}$	$8.80{\times}10^{-4}$
Bilinear	mean₅₀	$\mathbf{4.77{\times}10^{-4}}$	$\mathbf{7.77{\times}10^{-3}}$	$\mathbf{2.36{\times}10^{-3}}$	$\mathbf{9.05{\times}10^{-4}}$

Table 2 reports forecast MSE under best-checkpoint (the single epoch with lowest validation loss, matching the convention of Li et al. (2025, Table 1)) and mean over the final $50$ training epochs (the value a deployed system would see, robust to selection from noisy training curves). The two metrics agree on three of four cells: bilinear wins CartPole TI by $\sim 29\%$ , edges CartPole TV by $3$ – $5\%$ , and ties RSCP TI within noise—all consistent with Proposition 2.

The cells diverge on RSCP TV. Best-checkpoint favors Linear by $23\%$ ; mean₅₀ ties (bilinear marginally lower at $9.05$ vs $9.09\times 10^{-4}$ ). Figure 4 (bottom right) explains the gap: the Linear baseline trains into a noisy basin that oscillates with amplitude $\pm 0.5$ in log-loss for the entire $400$ -epoch run, while bilinear converges smoothly within $\sim 50$ epochs and remains flat. Best-checkpoint selects one of Linear’s downward spikes; the tail mean averages across them. A model selected by best-checkpoint cannot be reliably re-deployed without the validation-loss oracle that selected it; the tail mean is the deployment-relevant comparison.

5.4 Training Stability

Validation-loss curves (Figure 4) reveal a striking asymmetry between cells. On CartPole TI/TV and RSCP TI, both models train smoothly to convergence. On RSCP TV, the Linear model exhibits sustained oscillations of $\pm 0.5$ in log-loss throughout the entire $400$ -epoch run, while the bilinear model converges smoothly within the first $\sim 50$ epochs and remains flat thereafter. The variance of the Linear model’s val loss across the final $200$ epochs exceeds the bilinear model’s by an order of magnitude.

We interpret this as an optimization-side benefit of the bilinear parameterization: under time-varying physics, the linear lift must continuously adjust its operator generation to track the moving target, producing the noisy training signal we observe. The bilinear coupling absorbs a portion of the time-varying capacity directly into a stationary operator structure, decoupling the optimization landscape from the data’s non-stationarity. The headline best-checkpoint number for RSCP TV (bilinear $+23\%$ worse than Linear) and the tied mean₅₀ comparison (Table 2) together tell a single story: Linear’s apparent advantage is the selection artifact, while bilinear achieves comparable forecasting performance with dramatically more reliable training—a property that matters for deployment even when raw MSE does not separate the models.

The structural origin of this stabilization—the bilinear coupling absorbing TV capacity into a stationary operator—is the same mechanism that produces the closed-loop gains documented in Section 5.2. The training-time and inference-time effects of the coupling are two faces of one phenomenon.

5.5 Other Cells: CartPole and RSCP TI

The remaining cells (CartPole TI/TV, RSCP TI) exercise the bilinear extension under conditions in which it is either underutilized (CartPole, where forecasting wins do not translate to closed-loop separation) or operating outside its primary regime (RSCP TI, no time-varying parameters, no $u\cdot x$ coupling). They are reported here for completeness; none is the headline cell.

CartPole TI/TV. All three closed-loop controllers (Linear, Bilinear-SCP- $1$ , Bilinear-SCP- $5$ ) track within overlapping bands on both variants (Figure 1, top row); the bilinear extension does not hurt closed-loop performance even when its strict forecasting advantage (Table 2) does not translate into MPC gains. Lead-time results (Figure 3) show bilinear sitting modestly below Linear at every $d$ , with a roughly constant gap across the sweep. The system is benign enough that staleness does not catastrophically degrade either model.

RSCP TI. Both bilinear variants sit below Linear from $t\approx 0.5$ h through the end of horizon, with $0.3\sigma$ bands clearly separated by $t\approx 1.5$ h (Figure 1, lower left). Bilinear-SCP- $1$ marginally edges Bilinear-SCP- $5$ here—unlike the RSCP TV pattern, single linearization is sufficient on time-invariant dynamics. We attribute the bilinear gain to compounding accuracy in the multi-step rollout under the convex, $u\cdot x$ -free dynamics: even without TV physics, the additional lifting capacity offered by $G$ is occasionally useful when the controller exercises the input range broadly during transients.

5.6 Coupling Strength Across Cells

The learned $\|G\|_{F}$ values vary substantially across cells in ways that illuminate the structural mechanisms above: high under TV physics where $G$ absorbs operating-point drift, low under TI physics where Proposition 2 predicts $G^{\star}\approx 0$ , and elevated on RSCP TI where the optimizer absorbs training slack into $G$ without physical target. Per-cell values and qualitative training-trajectory shapes are reported in Appendix E.

5.7 Computational Cost

For the TV stale-plan experiments, Table 3 reports mean wall-clock per control step under lead-time execution for the Linear controller and Bilinear-SCP- $5$ across commitment windows $d\in\{0,1,3,5\}$ . Bilinear-SCP- $5$ is substantially slower than the single-QP Linear baseline but remains practical relative to the $18$ s RSCP sampling period; on CartPole, the timings should be read as an offline stress-test rather than a real-time deployment target. The decrease on CartPole TV with larger commitment window appears to be problem-dependent: the fixed prefix provides a better SCP warm start and often triggers earlier termination, whereas on RSCP TV the solver typically still reaches the maximum SCP iteration count.

Table 3: Mean wall-clock per control step under lead-time execution on the time-varying benchmarks

Method	$d=0$	$d=1$	$d=3$	$d=5$
CP — Linear	$0.077$ s	$0.077$ s	$0.078$ s	$0.078$ s
CP — Bilinear	$0.844$ s	$0.818$ s	$0.774$ s	$0.770$ s
RSCP — Linear	$0.099$ s	$0.095$ s	$0.098$ s	$0.098$ s
RSCP — Bilinear	$0.876$ s	$0.875$ s	$0.889$ s	$0.905$ s

6 Related Work

Koopman operators for control.

Korda and Mezić (2018) first demonstrated that EDMD-based Koopman models enable efficient convex MPC. Subsequent work has extended this to probabilistic models (Han et al., 2021), graph neural network liftings (Li et al., 2020), and time-varying-system modeling (Hao et al., 2024). MamKO (Li et al., 2025) introduced time-varying operators via Mamba-inspired matrix generation. Our work is orthogonal: we introduce a structural extension (bilinear coupling) that any of these could in principle adopt.

Bilinear dynamical systems.

The theory of bilinear systems is classical (Mohler, 1973) and their controllability properties well characterized (Elliott, 2009). The contribution here is bringing this structure into the learned Koopman-for-control pipeline, with a low-rank parameterization that regularizes sample complexity and a tractable SCP controller that exploits the resulting Jacobian structure.

Sequential convex programming.

SCP methods have a long history in aerospace trajectory optimization (Mao et al., 2018; Malyuta et al., 2022). Our application is standard; the novelty is that the bilinear Koopman model provides exact model Jacobians eqs. 12 to 13, avoiding the finite-difference approximations that degrade SCP convergence in black-box settings.

Data-driven nonlinear control.

SINDy-C (Brunton et al., 2016; Kaiser et al., 2018) can learn bilinear terms but requires a fixed polynomial library and does not scale to high-dimensional systems. Neural ODE methods (Chen et al., 2018) are highly expressive but relinquish the convex MPC structure. Our method occupies a deliberate middle ground: more expressive than linear Koopman, more tractable than black-box dynamics.

7 Conclusion

We have shown that the conditional-independence constraint shared across the Mamba-Koopman family, while necessary to preserve MPC convexity in single-shot formulations, leaves two regimes structurally underrepresented: control-state coupling and time-varying parameters. Our bilinear extension addresses both with a minimal architectural change, fewer than $1\%$ added parameters, exact model Jacobians, and a provably convergent SCP controller.

Across CartPole (which exercises $u\cdot x$ coupling) and RSCP (which exercises time-varying parameters in the TV variant without $u\cdot x$ coupling), four consistent effects emerged: forecasting non-inferiority on all cells under deployment-relevant averaging, with strict gains where Proposition 2 predicts them; substantial training stabilization on RSCP TV; closed-loop MPC gains on RSCP TV when SCP is iterated to convergence; and graceful degradation under stale-plan execution, where the bilinear model maintains a clear robustness advantage. The learned $\|G\|_{F}$ patterns provide an additional interpretive diagnostic for which mechanism each cell primarily exercises.

Open directions include extending the bilinear parameterization to the step-varying $B_{k}$ matrices, imposing structured priors on $G_{i}$ that encode known physical coupling topology, and designing benchmarks that isolate the $u\cdot x$ structural axis from time-varying parameters more cleanly.

AI Assistance Disclosure

We used AI assistants (large language models) for language editing, phrasing, and LaTeX formatting throughout this manuscript. All technical content, mathematical results, experimental design, code, and analyses are the authors’ own work. The AI tools did not contribute novel ideas, proofs, or empirical results.

References

S. L. Brunton, J. L. Proctor, and J. N. Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. Cited by: §6.
R. T.Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud (2018) Neural ordinary differential equations. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 31. Cited by: §6.
S. Chen (2022) A machine learning-based approach to cybersecurity and safety of model predictive control systems. Ph.D. Dissertation, University of California, Los Angeles. Note: Section 2.4 documents the corrected RSCP formulation used in this work. Cited by: §B.2, §B.2, §B.2.
D. L. Elliott (2009) Bilinear control systems: matrices in action. Springer. Cited by: §6.
D. Goswami and D. A. Paley (2020) Global bilinearization and reachability analysis of control-affine nonlinear systems. In The Koopman Operator in Systems and Control: Concepts, Methodologies, and Applications, A. Mauroy, I. Mezić, and Y. Susuki (Eds.), pp. 81–98. External Links: ISBN 978-3-030-35713-9, Document Cited by: §5.1.
A. Gu and T. Dao (2023) Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Cited by: §2.2.
M. Han, J. Euler-Rolle, and R. K. Katzschmann (2021) DeSKO: stability-assured robust control with a deep stochastic Koopman operator. In International Conference on Learning Representations (ICLR), Cited by: §6.
W. Hao, B. Huang, W. Pan, D. Wu, and S. Mou (2024) Deep Koopman learning of nonlinear time-varying systems. Automatica 159, pp. 111372. Cited by: §6.
E. Kaiser, J. N. Kutz, and S. L. Brunton (2018) Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A 474, pp. 20180335. Cited by: §6.
D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §C.4.
B. O. Koopman (1931) Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences 17 (5), pp. 315–318. Cited by: §1, §2.1.
M. Korda and I. Mezić (2018) Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 93, pp. 149–160. Cited by: §1, §2.1, §6.
W. Li and E. Todorov (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO, pp. 222–229. Cited by: §2.3.
Y. Li, H. He, J. Wu, D. Katabi, and A. Torralba (2020) Learning compositional Koopman operators for model-based control. In International Conference on Learning Representations (ICLR), Cited by: §6.
Z. Li, M. Han, and X. Yin (2025) MamKO: Mamba-based Koopman operator for modeling and predictive control. In International Conference on Learning Representations (ICLR), Cited by: §B.1, §B.1, §B.1, §B.2, §B.2, §B.2, §C.2, §D.1, §D.2, §1, §1, §2.2, §5.1, §5.1, §5.1, §5.2, §5.3, Table 1, Table 1, Table 2, Table 2, §6.
J. Liu, D. Muñoz de la Peña, and P. D. Christofides (2009) Distributed model predictive control of nonlinear process systems. AIChE Journal 55 (5), pp. 1171–1184. Cited by: §B.2.
J. Liu, D. Muñoz de la Peña, B. J. Ohran, P. D. Christofides, and J. F. Davis (2008) A two-tier architecture for networked process control. Chemical Engineering Science 63 (22), pp. 5394–5409. Cited by: §B.2, §B.2, §5.1.
B. Lusch, J. N. Kutz, and S. L. Brunton (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 9, pp. 4950. Cited by: §1, §2.1.
D. Malyuta, T. P. Reynolds, M. Szmuk, T. Lew, R. Bonalli, M. Pavone, and B. Açikmeşe (2022) Convex optimization for trajectory generation: a tutorial on generating dynamically feasible trajectories reliably and efficiently. IEEE Control Systems Magazine 42 (5), pp. 40–113. Cited by: §6.
Y. Mao, M. Szmuk, X. Xu, and B. Açikmeşe (2018) Successive convexification: a superlinearly convergent algorithm for non-convex optimal control problems. arXiv preprint arXiv:1804.06539. Cited by: §2.3, §4.3, §6, Proposition 3.
R. I. McLachlan and G. R. W. Quispel (2002) Splitting methods. Acta Numerica 11, pp. 341–434. External Links: Document Cited by: §3.3.
I. Mezić (2005) Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics 41, pp. 309–325. Cited by: §1.
R. R. Mohler (1973) Bilinear control processes. Academic Press. Cited by: §1, §2.3, §6.
B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd (2020) OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation 12, pp. 637–672. Cited by: §D.1, §D.4, §4.3.
H. F. Trotter (1959) On the product of semi-groups of operators. Proceedings of the American Mathematical Society 10 (4), pp. 545–551. External Links: Document Cited by: §3.3.
M. O. Williams, I. G. Kevrekidis, and C. W. Rowley (2015) A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 25, pp. 1307–1346. Cited by: §1, §2.1.

Appendix A Stability: Training Penalty and Inference Screening

Training.

The spectral penalty in the loss is

\mathcal{L}_{\mathrm{stab}}=\sum_{j}\mathrm{ReLU}\bigl(|\lambda_{j}(A_{\mathrm{disc},k})|-1+m_{s}\bigr),

with margin $m_{s}=0.05$ . Eigenvalues are computed via torch.linalg.eigvals cast to float32 to avoid mixed-precision artifacts. Zero initialization of $G_{i}$ ensures the model starts as an exact copy of the Linear baseline and the penalty is inactive at the start of training.

Inference.

We do not perform a per-step spectral check at inference. Closed-loop stability is enforced indirectly through the SCP trust-region constraint and the QP control bounds, both of which keep the linearized dynamics within the regime where our trained operators are well-conditioned. We did, however, evaluate the cheap Gershgorin disk pre-check ( $|A_{\mathrm{disc},k,ii}|+\sum_{j\neq i}|A_{\mathrm{disc},k,ij}|<1$ ) as an $O(d_{z}^{2})$ stability certificate at each MPC step. Empirically the disks straddle the unit circle in $100\%$ of MPC steps across both bilinear TV cells (CartPole and RSCP, $d\in\{0,1,3,5\}$ , $10$ episodes $\times 1000$ steps each), because the off-diagonal mass introduced by the bilinear factor $E_{P}(u_{k})=\exp(\sum_{i}u_{k}^{(i)}G_{i})$ inflates the radii past unity even when $\rho(A_{\mathrm{disc},k})<1$ . A full eigendecomposition would therefore be required for any runtime spectral gate; for $d_{z}\leq 64$ this is sub-millisecond per step and does not bottleneck the MPC loop, but we leave runtime stability gating to future work.

Appendix B System Dynamics

This appendix specifies the governing ODEs, parameters, and action spaces of the four system variants (CartPole TI/TV, RSCP TI/TV) used in the paper.

B.1 CartPole (TI, TV)

The state is $(x,\dot{x},\theta,\dot{\theta})$ with $\theta$ measured from the upright. The control is the horizontal force $F$ applied to the cart. Following the MamKO benchmark protocol [Li et al., 2025], we use gravity $g=10$ m s^-2, cart mass $m_{c}=1$ kg, pole mass $m_{p}=0.1$ kg, half-length $\ell=0.5$ m, force bound $|F|\leq 20$ N, and sampling period $\Delta t=0.02$ s. The equations of motion include cart and pole friction terms $\mu_{c},\mu_{p}$ as in the MamKO reference implementation.

Time-invariant variant (TI).

The friction coefficients are fixed at $\mu_{c}=0$ , $\mu_{p}=0$ (frictionless limit). This recovers the standard frictionless CartPole.

Time-varying variant (TV).

The cart friction is modulated as

\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\sin(\omega\,t),

with $\mu_{c}^{\mathrm{base}}=5\times 10^{-4}$ , $\mu_{p}=2\times 10^{-6}$ , and $\omega=1$ rad s^-1, matching the MamKO TV configuration. This is the specific cell used in the closed-loop MPC and lead-time experiments; results for $\omega\in\{0.1,10\}$ are not reported here.

Note on the sin/cos discrepancy.

The MamKO paper writes $\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\cos(\omega t)$ (Eq. 16 of Li et al. [2025]); the official MamKO repository uses sin in cartpole_V.py. We follow the repository, which is the executable ground truth.

Parameters.

Table 4 lists the numerical constants used for the CartPole benchmark. The dynamical constants and time-varying friction settings follow the MamKO benchmark protocol [Li et al., 2025]; the shared MamKO architecture hyperparameters for CartPole are summarized separately in Table 6.

Table 4: CartPole parameters for the time-invariant and time-varying benchmarks.

Symbol	Value	Description
$g$	$10$ m s^-2	gravitational acceleration
$m_{c}$	$1$ kg	cart mass
$m_{p}$	$0.1$ kg	pole mass
$\ell$	$0.5$ m	pole half-length
$\Delta t$	$0.02$ s	sampling period
$\|F\|$	$\leq 20$ N	control bound on horizontal force
$\mu_{c}^{\mathrm{base}}$	$5{\times}10^{-4}$	base cart friction coefficient setting
$\mu_{p}$	$2{\times}10^{-6}$	pole friction coefficient
$\omega$ (TV)	$1$ rad s^-1	frequency in $\mu_{c}(t)=\mu_{c}^{\mathrm{base}}+\sin(\omega t)$

B.2 RSCP: Reactor–Separator Process

RSCP is a reactor–separator process consisting of two continuous stirred-tank reactors (CSTR-1 and CSTR-2) operating in series, feeding a flash separator whose liquid bottoms are partially recycled to CSTR-1 [Liu et al., 2008]. The system was adopted as the MamKO benchmark in Li et al. [2025]. Two parallel first-order reactions $A\to B\to C$ proceed in each CSTR with Arrhenius kinetics; the separator performs ideal vapor–liquid equilibrium at fixed pressure with no reaction. The state vector is

x=(x_{A,1},\,x_{B,1},\,T_{1},\,x_{A,2},\,x_{B,2},\,T_{2},\,x_{A,3},\,x_{B,3},\,T_{3}),

where $x_{A,i},x_{B,i}$ are the mass fractions of species $A,B$ in vessel $i$ and $T_{i}$ is the corresponding temperature in Kelvin; vessel indices $1,2$ refer to the two CSTRs and $3$ to the flash separator. The control is $u=(Q_{1},Q_{2},Q_{3})$ , the heat duties applied to each vessel in kJ h^-1.

Governing ODEs.

The mass and energy balances follow Chen [2022, Section 2.4] rather than the formulation in the original Liu papers; see “Reproducibility notes” below for the discrepancies that motivated this choice. Letting $F_{1}=F_{10}+F_{r}$ and $F_{2}=F_{1}+F_{20}$ denote the total throughput of each CSTR, $r_{1,i}=k_{1}\exp(-E_{1}/(RT_{i}))$ and $r_{2,i}=k_{2}\exp(-E_{2}/(RT_{i}))$ the Arrhenius rates, and $\beta_{j}=(-\Delta H_{j})C_{M}/(\rho c_{p})$ the heat-of-reaction coefficients (see paragraph below), the governing equations for CSTR-1 and CSTR-2 are

	$\displaystyle\dot{x}_{A,1}$	$\displaystyle=\tfrac{F_{10}}{V_{1}}(x_{A,10}-x_{A,1})+\tfrac{F_{r}}{V_{1}}(x_{A,r}-x_{A,1})$
		$\displaystyle\quad-r_{1,1}x_{A,1},$
	$\displaystyle\dot{x}_{B,1}$	$\displaystyle=\tfrac{F_{10}}{V_{1}}(x_{B,10}-x_{B,1})+\tfrac{F_{r}}{V_{1}}(x_{B,r}-x_{B,1})$
		$\displaystyle\quad+r_{1,1}x_{A,1}-r_{2,1}x_{B,1},$
	$\displaystyle\dot{T}_{1}$	$\displaystyle=\tfrac{F_{10}}{V_{1}}(T_{10}-T_{1})+\tfrac{F_{r}}{V_{1}}(T_{3}-T_{1})$
		$\displaystyle\quad+\beta_{1}r_{1,1}x_{A,1}+\beta_{2}r_{2,1}x_{B,1}$
		$\displaystyle\quad+\tfrac{Q_{1}}{\rho c_{p}V_{1}},$

and analogously for CSTR-2 with $(F_{1},F_{20},V_{2},T_{20},Q_{2})$ replacing $(F_{10},F_{r},V_{1},T_{10},Q_{1})$ and the previous-vessel state replacing the recycle stream. The flash separator carries no reaction:

	$\displaystyle\dot{x}_{A,3}$	$\displaystyle=\tfrac{F_{2}}{V_{3}}(x_{A,2}-x_{A,3})-\tfrac{F_{r}+F_{p}}{V_{3}}(x_{A,r}-x_{A,3}),$
	$\displaystyle\dot{x}_{B,3}$	$\displaystyle=\tfrac{F_{2}}{V_{3}}(x_{B,2}-x_{B,3})-\tfrac{F_{r}+F_{p}}{V_{3}}(x_{B,r}-x_{B,3}),$
	$\displaystyle\dot{T}_{3}$	$\displaystyle=\tfrac{F_{2}}{V_{3}}(T_{2}-T_{3})+\tfrac{Q_{3}}{\rho c_{p}V_{3}}$
		$\displaystyle\quad+\tfrac{(F_{r}+F_{p})C_{M}}{\rho c_{p}V_{3}}\bigl(x_{A,r}\Delta H_{\mathrm{vap},A}$
		$\displaystyle\quad\quad+x_{B,r}\Delta H_{\mathrm{vap},B}+x_{C,r}\Delta H_{\mathrm{vap},C}\bigr).$

The recycle compositions $x_{A,r},x_{B,r},x_{C,r}$ are determined by ideal vapor-liquid equilibrium with relative volatilities $(\alpha_{A},\alpha_{B},\alpha_{C})$ ,

$\displaystyle x_{A,r}$	$\displaystyle=\tfrac{\alpha_{A}x_{A,3}}{D},$	(16)
$\displaystyle x_{B,r}$	$\displaystyle=\tfrac{\alpha_{B}x_{B,3}}{D},$
$\displaystyle x_{C,r}$	$\displaystyle=\tfrac{\alpha_{C}(1-x_{A,3}-x_{B,3})}{D}.$

D=\alpha_{A}x_{A,3}+\alpha_{B}x_{B,3}+\alpha_{C}(1-x_{A,3}-x_{B,3}).

The structurally important property for this paper is that the heat duties $Q_{i}$ enter the energy balances additively as $+Q_{i}/(\rho c_{p}V_{i})$ with constant coefficients, contributing no $u\cdot x$ terms.

Parameters.

Table 5 lists the numerical values used. Volumes, flows, feed states, kinetic constants, activation energies, heats of reaction, and relative volatilities are taken from Chen [2022, Table 2.1]; the molar concentration factor $C_{M}=2$ kmol m^-3 is the constant that converts mass fractions to molar concentrations in the energy balance (see “Reproducibility notes”).

Table 5: RSCP parameters (after the corrections in Section B.2).

Symbol	Value	Description
$V_{1},V_{2},V_{3}$	$1.0,\ 0.5,\ 1.0$ m³	vessel volumes
$F_{10},F_{20}$	$5.04,\ 5.04$ m³ h^-1	feed flows
$F_{r},F_{p}$	$50.4,\ 5.04$ m³ h^-1	recycle, purge
$T_{10},T_{20}$	$300,\ 300$ K	feed temperatures
$x_{A,10},x_{A,20}$	$1.0,\ 1.0$	feed compositions ( $A$ )
$k_{1},k_{2}$	$9.972{\times}10^{6},\ 9.36{\times}10^{6}$ h^-1	kinetic constants
$E_{1},E_{2}$	$5{\times}10^{4},\ 6{\times}10^{4}$ kJ kmol^-1	activation energies
$\Delta H_{1},\Delta H_{2}$	$-1.2{\times}10^{5},\ -1.4{\times}10^{5}$ kJ kmol^-1	reaction enthalpies
$\rho$	$1000$ kg m^-3	density
$c_{p}$	$4.2$ kJ kg^-1 K^-1	specific heat
$C_{M}$	$2$ kmol m^-3	molar concentration factor
$\Delta H_{\mathrm{vap},A,B,C}$	$-3.53,\ -1.57,\ -4.07$ $\times 10^{4}$ kJ kmol^-1	vap. enthalpies
$\alpha_{A},\alpha_{B},\alpha_{C}$	$3.5,\ 1.0,\ 0.5$	relative volatilities
$R$	$8.314$ kJ kmol^-1 K^-1	gas constant

Reproducibility notes.

We verified that the RSCP ODE system as written in Liu et al. [2008, 2009] does not reproduce the steady state reported by those papers under their stated parameter values. The discrepancy resolves under three corrections drawn from Chen [2022, Section 2.4]: (i) the heat-of-reaction term carries a molar concentration factor $\beta_{j}=(-\Delta H_{j})C_{M}/(\rho c_{p})$ rather than $\Delta H_{j}/c_{p}$ , where $C_{M}=2$ kmol m^-3; (ii) the separator energy balance includes the convective transport of vaporization enthalpies $\Delta H_{\mathrm{vap},\{A,B,C\}}$ for the three species; (iii) the steady-state heat duties $Q_{i}$ are of order $10^{6}$ kJ h^-1 rather than the $10^{9}$ value listed in Chen [2022, Table 2.1], which we identified as a typographical exponent error. Under the corrected formulation, forward integration from the published steady state has $\|\dot{x}\|_{\infty}<10^{-6}$ on temperatures and within $4\times 10^{-3}$ on compositions. We use this corrected system throughout. The published MamKO implementation of RSCP is not publicly available, so we cannot directly confirm that our reconstructed system matches theirs; however, the MamKO paper’s reported steady state and our forward-integration check are mutually consistent under the corrected ODEs.

Time-varying variant.

The TV variant follows the MamKO synthetic time-varying modifier $\varphi_{c}(t)=e^{-0.01t}$ [Li et al., 2025, Appendix C.3], applied multiplicatively to both Arrhenius rate terms in all three vessels. This simulates catalyst deactivation: reaction rates decay exponentially with time, and the model must track operating-point drift induced by the slowly shrinking effective kinetics. Crucially, $\varphi_{c}(t)$ multiplies only the kinetic terms, not the convective or heat-duty terms, so the structural property that $u_{k}$ does not couple multiplicatively with $x_{k}$ is preserved in the TV variant: RSCP TV exercises the time-varying-parameter axis of Section 1 without introducing $u\cdot x$ structure.

Nominal operating point and controls.

The MPC tracking task drives the system to the nominal steady state

x_{s}=(0.18,\;0.67,\;480.32,\;0.20,\;0.65,\\ 472.79,\;0.07,\;0.67,\;474.89)

following Li et al. [2025, Appendix C.3]. The control bounds are $|Q_{i}-Q_{i,s}|\leq 10^{6}$ kJ h^-1 around the nominal heat duties $Q_{i,s}$ ; these are enforced by the simulator’s action-space clipping rather than by the QP, since the QP operates in normalized control space (see Section D.4). The sampling period is $\Delta t=18$ s ( $=0.005$ h).

Appendix C Data Generation and Training Protocol

C.1 Trajectory Generation

For each system variant we generate training and test data with the official MamKO ReplayMemory pipeline. Continuous rollouts under per-step i.i.d. uniform random control excitation are segmented into overlapping input–output windows of length $H_{\mathrm{lookback}}+H_{\mathrm{pred}}=60$ . Episodes are integrated by forward Euler at the system sampling period, with no sub-stepping: $\Delta t=0.02$ s for CartPole and $\Delta t=18$ s for RSCP. Episodes terminate on physical bound violations or at a fixed horizon, whichever comes first: for CartPole, when $|\theta|>20^{\circ}$ or $|x|>10$ ; for RSCP, when concentration or temperature bounds are violated; for both systems, at $20{,}040$ steps for training episodes and $1{,}000$ steps for test episodes. Windows are pooled across episodes until the configured target is reached: $39{,}900$ training windows, split $80/20$ into train/validation subsets via a random split with seed $1$ , and $4{,}000$ test windows.

Control excitation is drawn independently at each step. For CartPole, the force is sampled as $F\sim\mathcal{U}(-20,20)$ N. For RSCP, each heat duty is sampled as $Q_{i}\sim\mathcal{U}(Q_{i}^{s}-10^{6},\ Q_{i}^{s}+10^{6})$ kJ h^-1, where $Q^{s}$ denotes the steady-state heating rates. Initial states are sampled at the start of each episode: for CartPole, $x_{0}\sim\mathcal{U}(-4,4)$ m and $\theta_{0}\sim\mathcal{U}(-0.1,0.1)$ rad, with zero initial velocities; for RSCP, components are drawn independently and uniformly within per-component half-widths around a verified fixed point $\bar{x}$ of the ODEs, $x_{0}\sim\mathcal{U}(\bar{x}-\boldsymbol{\rho},\ \bar{x}+\boldsymbol{\rho})$ with $\boldsymbol{\rho}=(0.05,\ 0.05,\ 10\,\mathrm{K},\ 0.05,\ 0.05,\ 10\,\mathrm{K},\ 0.02,\ 0.05,\ 10\,\mathrm{K})$ in the state ordering $(x_{A,i},x_{B,i},T_{i})$ for $i=1,2,3$ . Here $\bar{x}$ denotes the numerically verified fixed point of the corrected RSCP ODEs rather than the approximate steady state reported in the MamKO appendix; we use $\bar{x}$ as the perturbation center to avoid systematic drift back toward the true equilibrium during data generation. The data-generation seed for the train/validation split is $1$ and is reported alongside the released code.

C.2 Model Architecture

The encoder, backbone, and decoder follow the MamKO configuration [Li et al., 2025] for each system class. We re-use the MamKO reference implementation hyperparameters; only the bilinear extension introduces new parameters ( $G_{i}$ tensors at full rank $r=d_{z}$ , plus the spectral penalty weight $\lambda_{s}$ ). Table 6 summarizes the per-system settings.

Table 6: Architecture hyperparameters per system. CartPole settings follow the MamKO reference implementation; RSCP settings follow its RSCP benchmark configuration. Bilinear-specific entries (

r

\lambda_{s}

, discretization scheme) are reported under Bilinear-only.

Bilinear-only
	CartPole	RSCP
Lookback window	$30$	$30$
Forecast horizon	$30$	$30$
Latent dimension $d_{z}$	$8$	$15$
Mamba conv kernel $d_{\mathrm{conv}}$	$15$	$5$
Hidden dimension	$64$	$64$
Bilinear rank $r$	$8$	$15$
Spectral penalty $\lambda_{s}$	$0.01$	$0.01$
Discretization	lie_trotter	lie_trotter

The $G_{i}$ tensors are factorized as $G_{i}=L_{i}R_{i}^{\top}$ with $L_{i},R_{i}\in\mathbb{R}^{d_{z}\times r}$ , both initialized to zero. Zero initialization makes the bilinear model an exact copy of the Linear baseline at training step zero, so any training-time divergence between the two is attributable solely to the bilinear terms.

C.3 Lie–Trotter Discretization Details

The baseline MamKO diagonal step is preserved exactly by evaluating the diagonal flow element-wise:

[E_{D}]_{nn}=\exp(a_{n}\delta_{n}).

The associated diagonal input integral is computed element-wise as

[\bar{B}_{\mathrm{diag},k}]_{n,j}=\frac{\exp(a_{n}\delta_{n})-1}{a_{n}}\,[B_{k}^{(c)}]_{n,j}.

(17)

In implementation, the numerator is evaluated with torch.expm1 to avoid catastrophic cancellation as $a_{n}\to 0$ , with a small sign-preserving denominator offset used only for numerical stability.

The coupling factor $E_{P}(u_{k})$ is evaluated with a single $d_{z}\times d_{z}$ torch.linalg.matrix_exp call, yielding the discrete-time matrices

	$\displaystyle A_{\mathrm{disc},k}$	$\displaystyle=E_{P}(u_{k})E_{D},$		(18)
	$\displaystyle B_{\mathrm{disc},k}$	$\displaystyle=E_{P}(u_{k})\,\bar{B}_{\mathrm{diag},k}.$		(18)

Thus the diagonal ZOH step costs $O(d_{z})$ and the coupling step costs $O(d_{z}^{3})$ , rather than the augmented $(d_{z}+m)\times(d_{z}+m)$ matrix exponential required by a dense ZOH implementation that jointly recovers $A_{\mathrm{disc},k}$ and $B_{\mathrm{disc},k}$ . For $d_{z}\leq 64$ , this overhead remains negligible relative to backbone inference.

C.4 Optimization

All models are trained with Adam [Kingma and Ba, 2015] for $401$ epochs at initial learning rate $1\mathrm{e}{-}3$ , weight decay $1\mathrm{e}{-}3$ , batch size $256$ , and a step learning-rate schedule (step size $50$ epochs, $\gamma=0.9$ ). Gradient clipping is applied at $1.0$ . Best-validation-loss checkpoints are retained for evaluation. Due to compute constraints, results are reported from a single seed per configuration; per-cell dispersion is reported across the 10 closed-loop episodes and across the final 50 training epochs (mean₅₀ in Table 2). Multi-seed validation is deferred to a follow-up. Training ran in FP32 throughout; mixed precision was not used.

The training loss is the multi-step observation-space prediction loss of Equation 10, with the spectral penalty $\lambda_{s}\mathcal{L}_{\mathrm{stab}}$ disabled at inference.

C.5 Hardware and Wall-Clock

Training runs were executed on NVIDIA A100 GPUs on GCE a2-ultragpu-1g instances. Per-run training wall-clock was approximately $4.8$ hours for CartPole and $9.75$ hours for RSCP (bilinear Lie–Trotter models; corresponding linear baselines train roughly $6\times$ faster, at $\sim\!0.7$ hours per run on either system). Total compute budget for the experiments reported in this paper, including ablations and the lead-time sweep, was approximately $58$ GPU-hours (training: $32$ h across $8$ runs; closed-loop MPC regen=never d-sweeps: $26$ h across $4$ sweeps).

Appendix D Closed-Loop MPC Tasks

This appendix specifies the closed-loop MPC problems evaluated in Sections 5.2 and 5.5: cost functions, control bounds, warmup protocol, lead-time scheduling, and SCP hyperparameters.

D.1 Common Setup

For all systems we use prediction horizon $H_{\mathrm{MPC}}=30$ , apply horizon $1$ (standard receding-horizon MPC at $d=0$ ), and $10$ episodes per configuration. Each episode is $1{,}000$ MPC steps. Following MamKO Eq. 10a, the receding-horizon objective at time $k$ is

\hat{x}_{j|k}:=C_{j|k}\hat{z}_{j|k},

	$\displaystyle J_{k}$	$\displaystyle=\sum_{j=k+1}^{k+H-1}\Bigl(\\|\hat{x}_{j\|k}-x_{\mathrm{ref}}\\|_{Q}^{2}+\\|u_{j\|k}-u_{j-1\|k}\\|_{R}^{2}\Bigr)$		(19)
		$\displaystyle\quad+\\|\hat{x}_{k+H\|k}-x_{\mathrm{ref}}\\|_{P}^{2}.$		(19)

that is, a quadratic tracking term on the predicted decoded state together with a $\Delta u$ smoothness penalty, plus a terminal tracking cost. The QP is built by dynamics elimination over the $H_{\mathrm{MPC}}$ -step horizon and solved by OSQP [Stellato et al., 2020].

Warmup.

Following the MamKO evaluation protocol, the lookback buffer is initialized synthetically by repeating the simulator’s reset state $H$ times with zero control. MPC begins at simulator step $t=0$ ; no real warmup rollout is performed. This convention matches MamKO’s published evaluation loop [Li et al., 2025] and is the operating point under which the forecasting and closed-loop numbers in the main text are reported.

D.2 CartPole TV

Reference: upright stabilization at $x_{\mathrm{ref}}=(0,0,0,0)$ under sinusoidal cart friction (Section B.1). Stage and terminal cost weights are

	$\displaystyle Q$	$\displaystyle=\mathrm{diag}(1.0,\ 0.01,\ 100.0,\ 0.01),$
	$\displaystyle R$	$\displaystyle=0.5,$
	$\displaystyle P$	$\displaystyle=\mathrm{diag}(5000,\ 0,\ 0,\ 0),$

matching the MamKO reference configuration [Li et al., 2025]. The heavy weighting on $\theta$ (state index 3) reflects the priority of pole-angle stabilization over cart-position regulation, and the terminal cost penalizes only cart position. The control bound is $|F|\leq 20$ N. There is no safety constraint; the trajectory is unconstrained over the action range.

D.3 RSCP (TI, TV)

Reference: setpoint tracking to the nominal steady state $x_{s}$ of Section B.2. The cost weights heavily emphasize composition tracking over temperature tracking:

	$\displaystyle Q$	$\displaystyle=\mathrm{diag}(10^{4},\ 10^{4},\ 1,\ 10^{4},\ 10^{4},\ 1,\ 10^{4},\ 10^{4},\ 1),$
	$\displaystyle R$	$\displaystyle=5\times 10^{-12}\cdot I_{3},$
	$\displaystyle P$	$\displaystyle=Q.$

The asymmetric weighting follows the MamKO RSCP configuration: the relevant operational quantities are the species mass fractions, while the temperature states float to whatever the heat duties drive them to. The small $R$ value reflects the magnitude of the heat duties ( $u\sim 10^{6}$ in raw units): $R=5\times 10^{-12}$ balances the $\|\Delta u\|^{2}\sim 10^{12}$ contribution against the $\|x-x_{s}\|^{2}\sim O(1)$ tracking term. Control bounds are $|Q_{i}-Q_{i,s}|\leq 10^{6}$ kJ h^-1 around the nominal duties; these are enforced as action-space clipping in the simulator after the QP returns its result in normalized control space (see Section D.4).

D.4 SCP Hyperparameters

The SCP controller of Algorithm 1 runs with initial trust-region radius $\varepsilon_{0}=1.0$ (in normalized control space), shrinkage factor $1/2$ on cost-increase steps, and maximum SCP iterations $N_{\mathrm{SCP}}\in\{1,5\}$ as reported in Section 5.2. The inner QP is solved by OSQP [Stellato et al., 2020] with default tolerances, warm-started from the previous receding-horizon call’s solution to amortize solver iterations across the closed-loop trajectory. The QP is built in normalized control space using the MamKO instance-normalization statistics output by the dynamics-generation network at each MPC step; control bounds and the reference state are normalized correspondingly. After the QP returns, the optimal control is denormalized and clipped to the simulator’s raw-space action bounds before application.

D.5 Lead-Time Sweep

The lead-time experiment of Section 5.2 sweeps the commitment window $d\in\{0,1,3,5\}$ on both RSCP TV and CartPole TV with $N_{\mathrm{SCP}}=5$ and $10$ episodes per $d$ . The protocol is implemented by holding both the optimizer’s plan and the backbone’s dynamics output fixed during the lead window: at each MPC call the controller commits to the next $d$ controls from a queue, applies the next queued control to the simulator, and refills the queue from the freshly solved plan only after the queue empties; during this window the backbone is not re-evaluated, so the $\bar{A}_{k}$ , $\bar{B}_{k}$ matrices used for any auxiliary roll-forward are the ones generated at the start of the window. We refer to this as regen=never in our implementation; it isolates the open-loop correction capacity of the model class itself, with neither model receiving intermediate sensor or backbone updates within the lead window. The $d=10$ point is generated but excluded from the analysis in the main text on grounds of discretization-saturation: at the RSCP sampling period of $18$ s, $d=10$ corresponds to three minutes of stale plan, beyond the regime in which the underlying linearization is informative (cumulative costs converge for both models, see Table 7). The exact end-of-horizon values reported in Figures 2 and 3 are tabulated in Tables 7 and 8 for RSCP TV and CartPole TV, respectively.

Table 7: Cumulative log-cost at end of horizon, RSCP TV, lead-time sweep under regen=never with

N_{\mathrm{SCP}}=5

and

10

episodes per cell. Lower is better.

	$d=0$	$d=1$	$d=3$	$d=5$
Linear	$-2.3233$	$-1.8717$	$-2.0559$	$-2.2037$
Bilinear-SCP-5	$-2.5442$	$-2.7764$	$-2.5662$	$-2.4252$

Table 8: Cumulative log-cost at end of horizon, CartPole TV, lead-time sweep under regen=never with

N_{\mathrm{SCP}}=5

and

10

episodes per cell. Lower is better.

	$d=0$	$d=1$	$d=3$	$d=5$
Linear	$-1.9119$	$-1.8033$	$-1.6527$	$-1.4707$
Bilinear-SCP-5	$-1.9450$	$-1.9086$	$-1.7219$	$-1.4972$

Table 9: Learned bilinear coupling norm

\|G\|_{F}

at training convergence, per cell, with the qualitative shape of the training trajectory.

Cell	$\\|G\\|_{F}$	Trajectory shape
CartPole TI	$0.36$	Peak $0.48$ at ep. $60$ , decays
CartPole TV	$0.57$	Monotonic growth, recruitment
RSCP TI	$0.76$	Plateau, slack absorption
RSCP TV	$0.44$	Monotonic decay, multi-context

Appendix E Coupling Strength Diagnostic

Table 9 reports the learned $\|G\|_{F}$ at training convergence across the four cells, alongside the qualitative shape of the training trajectory. The four cells exhibit distinct training trajectories that together provide an interpretive scaffold for the empirical results in the main text.

The CartPole TI trajectory rises early to $\|G\|_{F}\approx 0.48$ , then decays under spectral regularization, settling at $0.36$ . The bilinear capacity is genuinely used—consistent with the strict forecasting gain on this cell—but constrained from over-recruitment.

CartPole TV grows monotonically to $0.57$ . The progressive recruitment matches the time-varying nature of the perturbed dynamics: as training proceeds, the model leans on $G$ to absorb friction-coefficient drift that the linear lift cannot track within a single MPC horizon.

RSCP TI plateaus high at $0.76$ , but the closed-loop and forecast results show no clear advantage of the bilinear model on this cell. We interpret this as the optimizer absorbing residual training slack into $G$ , which has no physical target on this cell ( $u\cdot x$ is structurally absent in the ODE). $\|G\|_{F}$ alone should not be read as evidence of bilinear strength: its size on RSCP TI is set by training-time slack, not by physics.

RSCP TV decays monotonically from an early peak to $0.44$ . Under time-varying parameters, $G$ must serve simultaneously across the family of operating regimes induced by the modulation; the optimizer responds by shrinking $G$ toward a setting that works on average rather than overfitting to any single regime. This is consistent with Section 5.2: Bilinear-SCP- $5$ , which iteratively re-linearizes the operator at the current operating point, outperforms Bilinear-SCP- $1$ on this cell because the average $G$ is what the model carries while iterative SCP is what extracts operating-point-specific use of it.

The four trajectories provide an interpretive scaffold rather than a predictive theory: $\|G\|_{F}$ is the empirical witness for the structural condition in Proposition 2 (CartPole TI), the recruitment indicator under TV physics (CartPole TV, RSCP TV), and the slack-absorption diagnostic where no physical target exists (RSCP TI). The proposition does not predict trajectory shape; the trajectories make sense in light of it.