Bilinear Mamba-Koopman Neural MPC for Varying Dynamics
Abstract
Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution.
We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone- descent and KKT convergence results under standard trust-region assumptions.
Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provides a simple and effective mechanism for robust MPC under varying conditions.
1 Introduction
The Koopman operator offers an elegant reformulation of nonlinear dynamics: any nonlinear system, however complex in state space, admits a linear representation in an infinite-dimensional function space (Koopman, 1931; Mezić, 2005). Finite-dimensional approximations learned from trajectory data have enabled controllers that combine the tractability of linear MPC with operation on genuinely nonlinear physical systems (Korda and Mezić, 2018; Lusch et al., 2018; Williams et al., 2015). For real-time industrial control-chemical reactors, compressors, power converters-tractable convex optimization at each control step is a hard operational requirement.
Recent work has extended Koopman learning to time-varying systems by replacing fixed operator matrices with sequences generated from historical data. A representative instance of this class is the Mamba-based Koopman operator (MamKO) (Li et al., 2025), which processes recent state and control history through a convolutional-FCNN network to emit the matrices , , of a local linear system at each step, substantially outperforming static Koopman models while retaining the convex MPC formulation.
The structural constraint.
This class of methods rests on a deliberate architectural choice: the drift operator is generated from historical data but is explicitly not a function of the current control input . Making depend on would introduce a bilinear term , breaking MPC convexity. This constraint is acknowledged in the MamKO formulation (Li et al., 2025) and is forced more generally by any control architecture that preserves single-shot QP convexity. The consequence is conditional independence . This constraint limits the model’s expressive power along two distinct axes.
Control–state coupling. When the governing physics contains terms-convective fluxes in flow-actuated reactors, multiplicative inputs in mechanical systems, voltage-dependent impedances in power electronics (Mohler, 1973), the mixed partial is structurally nonzero. MamKO’s architecture forces this quantity to zero, requiring its backbone to absorb the coupling indirectly via step-by-step regeneration of . Over a multi-step rollout, the resulting approximation error accumulates fastest in regimes where the controller is deliberately exercising large control variation-precisely where MPC matters most.
Time-varying parameters. When the dynamics themselves drift over time—catalyst aging, heat-exchanger fouling, mechanical wear, friction modulation—the effective operator must track changes that the historical encoder may not yet have observed within its lookback window. A bilinear coupling provides an additional, structured channel through which the operator can adapt to the present operating point in a way that the linear lift cannot reach within a single MPC horizon.
Both regimes are pervasive in industrial control: the first is structural, the second operational. A single architectural mechanism that addresses both-without requiring a different model class for each-is the contribution of this paper.
Our contribution.
We propose a minimal, principled extension of the Mamba-Koopman dynamics that relaxes conditional independence while retaining the features that make the model useful for control:
| (1) |
where contains the diagonal continuous-time eigenvalues of the existing Mamba–Koopman backbone (unchanged), are learned bilinear coupling tensors and is the step-varying actuator matrix from the existing dynamics network. Setting restores MamKO exactly. The bilinear structure exposes exact analytical model Jacobians, making Sequential Convex Programming (SCP) natural: linearize around a nominal trajectory, solve a convex QP, update, repeat.
Summary of contributions.
-
1.
Bilinear Koopman dynamics (Section 3): a strict generalization of the conditional-independence class modeling control–state coupling, with low-rank parameterization for parameter efficiency and a spectral penalty enforcing stability of the resulting dense operator.
-
2.
SCP controller (Section 4): an iterative MPC algorithm exploiting exact model Jacobians, with monotone-descent and KKT convergence results under standard trust-region assumptions.
-
3.
Empirical evaluation (Section 5) across CartPole and RSCP in time-invariant and time-varying regimes, establishing four consistent effects: forecasting non-inferiority on every cell, training stabilization on RSCP TV, closed-loop MPC wins on RSCP TV when SCP is iterated, and graceful degradation under stale-plan execution.
Scope and limitations.
Our method inherits the conditional-independence baseline’s assumptions: the latent dynamics are approximately linear at each step, and the encoder is expressive enough to find a lifting where this holds. The SCP controller provides local (KKT) rather than global optimality guarantees, and convergence rate depends on initialization quality. We position this work as closing a specific, well-characterized structural gap in the Koopman-for-control literature without claiming a new operator class.
2 Background
2.1 Koopman Operators for Controlled Systems
For a discrete-time controlled system , the Koopman operator (Koopman, 1931) acts on observables via composition: . This operator is linear even when is not. One seeks a finite-dimensional lifting such that the lifted state evolves approximately linearly (Korda and Mezić, 2018):
| (2) |
EDMD (Williams et al., 2015) identifies by least squares over a fixed observable dictionary. Deep Koopman methods replace the fixed dictionary with a learned encoder (Lusch et al., 2018).
2.2 Mamba-Koopman and Conditional Independence
MamKO (Li et al., 2025) addresses the limitation of a fixed operator for time-varying systems. Inspired by Mamba’s selective state-space mechanism (Gu and Dao, 2023), a matrices-generation network takes the historical sequence and emits time-varying operators via 1D convolution and FCNNs. After ZOH discretization:
| (3) |
Generating all matrices from alone enforces
| (4) |
preserving linearity in and hence MPC convexity, but preventing control-state coupling.
2.3 Bilinear Systems
A discrete-time bilinear system takes the form , where are interaction matrices. Bilinear systems occupy a well-studied middle ground between linear and fully nonlinear: they represent a broad class of physical phenomena including chemical reactions and mechanical systems with multiplicative inputs (Mohler, 1973). The Koopman literature has identified bilinear lifted models and related finite-dimensional approximations as a useful middle ground for control, making this a natural target for Koopman approximation. SCP (Mao et al., 2018) and iterative LQR (Li and Todorov, 2004) handle optimization over bilinear dynamics by exploiting the analytic Jacobian structure.
3 Bilinear Koopman Extension
3.1 Bilinear Latent Dynamics
We relax eq. 4 by parameterizing the latent transition as eq. 1, where , , and are as described in Section 1. The observation map remains linear. The effective drift operator is a linear function of , making the full dynamics bilinear in jointly.
Proof.
Proposition 2 (Population-Level Forecasting Non-Inferiority).
Proof.
The inequality follows from Proposition 1: any optimum in is achievable in by setting . For strictness, suppose the true latent dynamics satisfy with . Any incurs irreducible error proportional to , where is the training-set mean control. This error vanishes only when is constant across the evaluation distribution-vacuous in any MPC setting. ∎
Remark 1.
Empirically the strictness condition partitions the systems we evaluate. On CartPole, where the equations of motion couple horizontal force with pendulum geometry, the bilinear extension produces strict forecasting gains on TI and TV. On RSCP, where heat duties enter additively and the governing ODE contains no terms, and the proposition predicts equality—which is what we observe on RSCP TI, and on RSCP TV under deployment-relevant averaging (Table 2). The bilinear-vs-linear gap visible in best-checkpoint MSE on RSCP TV reflects a training-noise asymmetry between the two models, not a violation of the proposition: the Linear baseline oscillates throughout training while bilinear converges smoothly, so single-epoch selection from linear’s trajectory captures one of its downward spikes. Mean-last- MSE—which averages across the spikes—restores the equality the proposition predicts. The learned values (Table 9) empirically confirm the partition.
3.2 Low-Rank Parameterization
The tensors add parameters. For typical settings (, ), this is at most K parameters, under 1% of a standard Mamba-Koopman backbone. We impose a low-rank prior:
| (5) |
reducing the addition to parameters and regularizing the coupling strength when . Both and are initialized to zero, so the model begins as an exact copy of the Linear baseline; any divergence in training is attributable solely to the bilinear terms. The rank is a single interpretable hyperparameter: encodes a rank- perturbation per control channel; recovers the unconstrained bilinear model. We use in all reported runs, as is small enough that further parameter reduction is unnecessary; the low-rank option remains available for deployments with larger latent dimensions.
3.3 Lie–Trotter Split ZOH Discretization
Write the effective generator as
| (6) | ||||
Dense ZOH would exponentiate at a single scalar period , collapsing the baseline MamKO per-mode timescales . This would break the exact zero-coupling reduction to the linear baseline. We therefore use a first-order Lie–Trotter splitting (McLachlan and Quispel, 2002), historically rooted in the Trotter product formula (Trotter, 1959):
| (7) | ||||
With the corresponding diagonal ZOH input integral, the implemented one-step map is
| (8) |
When , , so the scheme reduces exactly to the baseline per-mode diagonal ZOH. Implementation details, numerical stabilization, and cost are deferred to Section C.3.
3.4 Stability
The negative_celu activation in the Mamba-Koopman backbone constrains the diagonal entries of to be at most 1, guaranteeing for the diagonal case. This guarantee does not extend to the dense : the bilinear terms can shift eigenvalues outside the unit disk. We apply a two-tier approach at training and inference.
Training: spectral penalty.
We add to the loss
| (9) |
with margin , computed via torch.linalg.eigvals in float32 to avoid mixed-precision artifacts. Zero initialization of ensures the model starts in the stable regime.
Inference.
At inference we evaluate at each MPC step via torch.linalg.eigvals, at cost; for this is negligible relative to backbone inference. A discussion of why we did not use a Gershgorin pre-check for screening is deferred to Appendix A.
3.5 Training Objective
We train end-to-end on multi-step observation-space prediction loss:
| (10) |
with the stability penalty disabled at inference. All baseline training hyperparameters (optimizer, schedule, context window ) are held fixed; only and the jointly trained backbone parameters are updated.
4 SCP Controller for Bilinear MPC
4.1 Problem Formulation
The open-loop MPC problem over horizon is
| (11) | ||||
| s.t. | ||||
This problem is non-convex in jointly. The conditional-independence baseline sidesteps non-convexity by design (conditional independence makes the QP trivially convex in one shot). We instead solve eq. 11 iteratively via SCP, exploiting the exact differentiable structure of the implemented bilinear model.
4.2 Exact Model Jacobians
SCP linearizes the implemented Lie–Trotter map eq. 8 around a nominal trajectory . Since is affine in , the state Jacobian is
| (12) | ||||
| (13) |
The input Jacobian contains the derivative of the matrix exponential with respect to . In implementation, we compute by reverse-mode automatic differentiation of the full Lie–Trotter update using torch.func.jacrev. This yields exact model Jacobians of the implemented differentiable dynamics up to floating-point precision, with no finite-difference linearization inside SCP.
4.3 SCP Algorithm
Each SCP iteration solves the convex subproblem obtained by evaluating the finite-horizon quadratic tracking objective on the linearized trajectory. Denoting this objective by , we solve
| (14) | ||||
| s.t. | ||||
Here includes the same stage and terminal terms as the true MPC objective; when a penalty is used, the previous applied input is treated as fixed data, so the subproblem remains a convex QP in . Eliminating the linearized dynamics yields a dense box-constrained QP solved via OSQP (Stellato et al., 2020) with warm-starting between iterations.
Proposition 3 (SCP Monotone Descent and KKT Convergence).
Consider the bilinear MPC problem eq. 11 with a continuously differentiable finite-horizon objective whose stage and terminal terms are convex quadratic in the variables of each SCP subproblem (including, when used, control-increment penalties with the previous input treated as fixed problem data), and with box constraints . Suppose the SCP iterates remain in a bounded set, the implemented dynamics map of eq. 8 has locally Lipschitz Jacobian on a neighborhood of that set, and each convex subproblem includes both the feasibility constraint and the trust-region constraint . Then:
(i) Monotone descent. Let denote the predicted QP cost decrease at iteration . Because has locally Lipschitz Jacobian, the first-order model error on a trust region of radius is uniformly on bounded sets. Hence the discrepancy between the linearized and true objectives over a finite horizon satisfies
| (15) |
When is chosen so that this approximation error is less than , the accepted step is truly descent, so the sequence is non-increasing. The trust-region shrinkage on failed steps guarantees this condition is eventually met because the right-hand side of eq. 15 vanishes as .
(ii) KKT convergence. If the standard regularity assumptions for trust-region SCP hold at an accumulation point—in particular a constraint qualification for the active box constraints—then any accumulation point of the accepted iterates is a KKT point of eq. 11. Equivalently, the predicted QP decrease vanishes only at first-order stationary points of the original nonlinear MPC problem, and the associated multipliers satisfy the KKT conditions by the standard trust-region SCP argument of Mao et al. (2018, Theorem 2).
Proof.
By construction, the matrices in eqs. 12 to 13 are the exact Jacobians of the implemented Lie–Trotter map , computed without finite differences.
For part (i), local Lipschitz continuity of the Jacobian of yields a quadratic first-order remainder on each trust region. Over a finite horizon, boundedness of the iterates implies a uniform constant in the bound, so on the trust region. Hence sufficiently small trust regions guarantee that the true cost decrease matches the predicted decrease up to higher-order error, which yields monotone descent after finitely many shrinkage steps.
For part (ii), feasibility is preserved because each QP enforces . Once the predicted decrease tends to zero, the accepted iterates satisfy the first-order necessary conditions of the nonlinear MPC problem under the standard trust-region SCP regularity assumptions. The cited result of Mao et al. (2018, Theorem 2) then implies that every accumulation point of the accepted iterates is a KKT point of eq. 11. ∎
Remark 2.
The boundedness assumption on iterates is retained as an explicit hypothesis. In practice it is promoted by compact control constraints together with the spectral regularization and runtime stability checks of Section 3.4, but those mechanisms do not by themselves constitute a formal proof that uniformly for all .
Corollary 3.1.
The conditional-independence baseline eq. 4 solves a single QP per MPC step but cannot represent control–state coupling. The proposed method solves QPs on the implemented bilinear model. Both are polynomial-time per iteration; the additional QPs incur negligible wall-clock cost relative to the modeling gain (Table 3).
5 Experiments
5.1 Setup
Systems.
We evaluate on two benchmarks across time-invariant and time-varying regimes, yielding four cells.
CartPole. Standard underactuated cart-pole swing-up; states , control (horizontal force). The horizontal force enters the equations of motion through products with trigonometric functions of , making the system control-affine with state-dependent input gain . The Koopman generator of such a system acts bilinearly on observables in the control input (Goswami and Paley, 2020), so any sufficiently expressive lifting that captures , , and inherits a coupling that MamKO’s conditional-independence constraint excludes. Whether the learned encoder spans these coordinates well enough for the gap to be empirically detectable is one of the questions our forecasting experiments answer (Table 2). The time-invariant variant (TI) uses constant cart friction; the time-varying variant (TV) modulates the friction coefficient as .
RSCP. The reactor–separator process of Liu et al. (2008), adopted as the MamKO benchmark in Li et al. (2025). Two CSTRs in series followed by a flash separator with recycle; states and controls (heat duties ). The heat duties enter the energy balances additively as : the governing ODE contains no terms, only nonlinearity in state. The TV variant modulates feed composition sinusoidally, introducing time-varying parameters without adding control–state coupling. RSCP exercises the second axis of Section 1 (time-varying parameters) without the first (control coupling), making the two benchmarks complementary tests of the bilinear extension.
Baselines.
Our primary structural ablation is the conditional-independence baseline, which is equivalent to our model with . We instantiate this baseline using the MamKO architecture (Li et al., 2025), which is representative of the class. Throughout, “Linear” refers to this baseline. MamKO itself was evaluated against the Deep Koopman Operator (DKO) in the original work and established as the stronger time-varying Koopman baseline; we inherit this comparison by transitivity. Our method is denoted Bilinear-; we use (full rank) in all reported runs, giving on CartPole and on RSCP (Table 6). Closed-loop variants are Bilinear-SCP- for SCP iterations.
Protocol.
We generate training, validation, and test trajectories per system using mixed sinusoidal and step-function control excitation. All Koopman-based methods use the same latent dimension , prediction horizon, and training budget; dataset splits and random seeds are held constant. Forecasting is reported as 30-step open-loop MSE on held-out test trajectories. Closed-loop MPC is evaluated over episodes per cell under a setpoint-tracking task; we report cumulative running-average cost as a function of time, with shaded bands111Bands are reported at for visual readability of model separation; the conclusions are unchanged at , though some panels then exhibit fully overlapping bands. We report the band width explicitly so readers can interpret the figures under the variance scale of their preference. for visual comparison.
Reproducibility against published values.
Our re-trained Linear baseline reproduces the forecast MSE of Li et al. (2025, Table 1) within on CartPole TI and within on RSCP TI. On CartPole TV our Linear baseline trains to MSE versus the published , a gap we have not closed under matched hyperparameters; on RSCP TV our Linear baseline trains to versus the published , a improvement that we attribute to the training-stability mechanism documented in Section 5.4. Bilinear-vs-Linear comparisons throughout this paper are reported against our own re-trained baseline; reviewers comparing absolute numbers should reference this calibration.
5.2 Time-Varying Capture: SCP Iteration and Lead-Time Robustness
The headline cell of this paper is RSCP TV: a chemical reactor benchmark with sinusoidally modulated feed composition, no coupling in its governing ODE, and time-varying parameters that the linear lift can only track through history-driven re-generation of the operator at each MPC step. Two experiments converge on a single finding: under TV physics, the bilinear coupling provides operator-level correction that the linear lift architecturally cannot reach within a single MPC horizon, but unlocking it requires either iterative re-linearization at solve time or commitment to a control sequence whose effect on the operator is structurally encoded.
SCP iteration unlocks TV capture under standard MPC.
Figure 1 reports cumulative running-average closed-loop cost for Linear (single QP), Bilinear-SCP- (one SCP iteration on the bilinear model—equivalent to LQR-style single linearization), and Bilinear-SCP- (five iterations) across the four cells. On RSCP TV (bottom right), Bilinear-SCP- separates from both other controllers at h and the gap widens monotonically through end of horizon. Bilinear-SCP-, by contrast, does not distinguish itself from Linear and trails it slightly. A single linearization of the bilinear model is insufficient to exploit the additional capacity under TV physics: the SCP iterations are what unlock the operator-level corrections the bilinear coupling makes available, by repeatedly re-linearizing at the current operating point as the iterates converge.
Table 1 reports per-cell closed-loop tracking cost for Linear and our Bilinear-SCP-5 controller over the matched evaluation window of Li et al. (2025, Fig. 3). Bilinear-SCP-5 wins on three of four cells, with the largest gap on RSCP TV ( vs , a reduction). The single loss is a gap on CartPole TI; this regime carries no coupling, no time-varying parameters, and benign dynamics, so the additional SCP overhead does not compound into a benefit.
| Cell | Linear | Bilinear-SCP-5 |
|---|---|---|
| CartPole TI (20 s) | ||
| CartPole TV (20 s) | ||
| RSCP TI (2 h) | ||
| RSCP TV (2 h) |
We emphasize that Bilinear-SCP- is not a stripped-down ablation but a realistic deployment baseline: it is what one obtains from running the bilinear model under a budget-constrained MPC stack (e.g., compute-limited edge controllers). The separation between SCP- and SCP- on RSCP TV is the empirical cost of single linearization in the TV regime.
Lead-time commitment unlocks the same correction with a frozen backbone.
In deployment, MPC controllers are not always free to re-plan at every control step: compute budgets, network round-trip delays, and asynchronous sensor updates force the controller to commit to a previously computed plan for some lead time before re-planning. We probe this regime via a lead-time experiment: at each MPC step the controller commits to its plan for subsequent steps before re-planning, with . The case recovers standard receding-horizon MPC. Crucially, we hold both the original plan and the backbone dynamics fixed during the lead window (regen=never), so neither model receives intermediate corrections from new sensor data. This isolates the open-loop correction capacity of the model class itself: the linear lift’s drift is fixed across the entire lead window because it depends only on history; the bilinear model’s effective drift varies with the committed control along the lead window, providing operator-level correction even with the backbone frozen. We exclude from analysis: at over three minutes of stale plan on RSCP, both models saturate from discretization-induced error rather than model-class differences.
Figure 2 shows the result on RSCP TV. The bilinear model sits below Linear at every commitment window (Table 7), with the gap widening sharply once planning becomes stale. At the bilinear advantage is modest ( vs , a gap of in log-cost). At , Linear degrades sharply—its cumulative log-cost plateaus near by end of horizon, while the bilinear model reaches , a gap of nearly in log-cost. The pattern persists at (Linear plateau , bilinear ) and shrinks at where both models saturate but bilinear still leads by in log-cost. Figure 3 reports the same experiment on CartPole TV. The bilinear advantage is small throughout and roughly constant in (Table 8); the underlying dynamics are benign enough that plan staleness does not catastrophically degrade either model.
One mechanism, two regimes.
Both findings identify the same mechanism. The bilinear coupling encodes how the drift varies with the control, and this variation is exploitable through any control regime that exercises the dependency: iterated SCP exercises it via re-linearization at the current operating point; lead-time commitment exercises it via rollout of the committed control sequence with the backbone frozen. The Linear baseline, lacking the coupling, has no analogous mechanism: its only access to time-varying physics is backbone re-generation, which fails under freeze and is insufficient under standard MPC. This regime is operationally relevant: control deployments commonly enforce re-planning at intervals longer than the underlying simulator step due to compute, communication, or scheduling constraints.
5.3 Forecasting Non-Inferiority
| Model | Metric | CP TI | CP TV | RSCP TI | RSCP TV |
|---|---|---|---|---|---|
| Linear | best | ||||
| mean50 | |||||
| Bilinear | best | ||||
| mean50 |
Table 2 reports forecast MSE under best-checkpoint (the single epoch with lowest validation loss, matching the convention of Li et al. (2025, Table 1)) and mean over the final training epochs (the value a deployed system would see, robust to selection from noisy training curves). The two metrics agree on three of four cells: bilinear wins CartPole TI by , edges CartPole TV by –, and ties RSCP TI within noise—all consistent with Proposition 2.
The cells diverge on RSCP TV. Best-checkpoint favors Linear by ; mean50 ties (bilinear marginally lower at vs ). Figure 4 (bottom right) explains the gap: the Linear baseline trains into a noisy basin that oscillates with amplitude in log-loss for the entire -epoch run, while bilinear converges smoothly within epochs and remains flat. Best-checkpoint selects one of Linear’s downward spikes; the tail mean averages across them. A model selected by best-checkpoint cannot be reliably re-deployed without the validation-loss oracle that selected it; the tail mean is the deployment-relevant comparison.
5.4 Training Stability
Validation-loss curves (Figure 4) reveal a striking asymmetry between cells. On CartPole TI/TV and RSCP TI, both models train smoothly to convergence. On RSCP TV, the Linear model exhibits sustained oscillations of in log-loss throughout the entire -epoch run, while the bilinear model converges smoothly within the first epochs and remains flat thereafter. The variance of the Linear model’s val loss across the final epochs exceeds the bilinear model’s by an order of magnitude.
We interpret this as an optimization-side benefit of the bilinear parameterization: under time-varying physics, the linear lift must continuously adjust its operator generation to track the moving target, producing the noisy training signal we observe. The bilinear coupling absorbs a portion of the time-varying capacity directly into a stationary operator structure, decoupling the optimization landscape from the data’s non-stationarity. The headline best-checkpoint number for RSCP TV (bilinear worse than Linear) and the tied mean50 comparison (Table 2) together tell a single story: Linear’s apparent advantage is the selection artifact, while bilinear achieves comparable forecasting performance with dramatically more reliable training—a property that matters for deployment even when raw MSE does not separate the models.
The structural origin of this stabilization—the bilinear coupling absorbing TV capacity into a stationary operator—is the same mechanism that produces the closed-loop gains documented in Section 5.2. The training-time and inference-time effects of the coupling are two faces of one phenomenon.
5.5 Other Cells: CartPole and RSCP TI
The remaining cells (CartPole TI/TV, RSCP TI) exercise the bilinear extension under conditions in which it is either underutilized (CartPole, where forecasting wins do not translate to closed-loop separation) or operating outside its primary regime (RSCP TI, no time-varying parameters, no coupling). They are reported here for completeness; none is the headline cell.
CartPole TI/TV. All three closed-loop controllers (Linear, Bilinear-SCP-, Bilinear-SCP-) track within overlapping bands on both variants (Figure 1, top row); the bilinear extension does not hurt closed-loop performance even when its strict forecasting advantage (Table 2) does not translate into MPC gains. Lead-time results (Figure 3) show bilinear sitting modestly below Linear at every , with a roughly constant gap across the sweep. The system is benign enough that staleness does not catastrophically degrade either model.
RSCP TI. Both bilinear variants sit below Linear from h through the end of horizon, with bands clearly separated by h (Figure 1, lower left). Bilinear-SCP- marginally edges Bilinear-SCP- here—unlike the RSCP TV pattern, single linearization is sufficient on time-invariant dynamics. We attribute the bilinear gain to compounding accuracy in the multi-step rollout under the convex, -free dynamics: even without TV physics, the additional lifting capacity offered by is occasionally useful when the controller exercises the input range broadly during transients.
5.6 Coupling Strength Across Cells
The learned values vary substantially across cells in ways that illuminate the structural mechanisms above: high under TV physics where absorbs operating-point drift, low under TI physics where Proposition 2 predicts , and elevated on RSCP TI where the optimizer absorbs training slack into without physical target. Per-cell values and qualitative training-trajectory shapes are reported in Appendix E.
5.7 Computational Cost
For the TV stale-plan experiments, Table 3 reports mean wall-clock per control step under lead-time execution for the Linear controller and Bilinear-SCP- across commitment windows . Bilinear-SCP- is substantially slower than the single-QP Linear baseline but remains practical relative to the s RSCP sampling period; on CartPole, the timings should be read as an offline stress-test rather than a real-time deployment target. The decrease on CartPole TV with larger commitment window appears to be problem-dependent: the fixed prefix provides a better SCP warm start and often triggers earlier termination, whereas on RSCP TV the solver typically still reaches the maximum SCP iteration count.
| Method | ||||
|---|---|---|---|---|
| CP — Linear | s | s | s | s |
| CP — Bilinear | s | s | s | s |
| RSCP — Linear | s | s | s | s |
| RSCP — Bilinear | s | s | s | s |
6 Related Work
Koopman operators for control.
Korda and Mezić (2018) first demonstrated that EDMD-based Koopman models enable efficient convex MPC. Subsequent work has extended this to probabilistic models (Han et al., 2021), graph neural network liftings (Li et al., 2020), and time-varying-system modeling (Hao et al., 2024). MamKO (Li et al., 2025) introduced time-varying operators via Mamba-inspired matrix generation. Our work is orthogonal: we introduce a structural extension (bilinear coupling) that any of these could in principle adopt.
Bilinear dynamical systems.
The theory of bilinear systems is classical (Mohler, 1973) and their controllability properties well characterized (Elliott, 2009). The contribution here is bringing this structure into the learned Koopman-for-control pipeline, with a low-rank parameterization that regularizes sample complexity and a tractable SCP controller that exploits the resulting Jacobian structure.
Sequential convex programming.
SCP methods have a long history in aerospace trajectory optimization (Mao et al., 2018; Malyuta et al., 2022). Our application is standard; the novelty is that the bilinear Koopman model provides exact model Jacobians eqs. 12 to 13, avoiding the finite-difference approximations that degrade SCP convergence in black-box settings.
Data-driven nonlinear control.
SINDy-C (Brunton et al., 2016; Kaiser et al., 2018) can learn bilinear terms but requires a fixed polynomial library and does not scale to high-dimensional systems. Neural ODE methods (Chen et al., 2018) are highly expressive but relinquish the convex MPC structure. Our method occupies a deliberate middle ground: more expressive than linear Koopman, more tractable than black-box dynamics.
7 Conclusion
We have shown that the conditional-independence constraint shared across the Mamba-Koopman family, while necessary to preserve MPC convexity in single-shot formulations, leaves two regimes structurally underrepresented: control-state coupling and time-varying parameters. Our bilinear extension addresses both with a minimal architectural change, fewer than added parameters, exact model Jacobians, and a provably convergent SCP controller.
Across CartPole (which exercises coupling) and RSCP (which exercises time-varying parameters in the TV variant without coupling), four consistent effects emerged: forecasting non-inferiority on all cells under deployment-relevant averaging, with strict gains where Proposition 2 predicts them; substantial training stabilization on RSCP TV; closed-loop MPC gains on RSCP TV when SCP is iterated to convergence; and graceful degradation under stale-plan execution, where the bilinear model maintains a clear robustness advantage. The learned patterns provide an additional interpretive diagnostic for which mechanism each cell primarily exercises.
Open directions include extending the bilinear parameterization to the step-varying matrices, imposing structured priors on that encode known physical coupling topology, and designing benchmarks that isolate the structural axis from time-varying parameters more cleanly.
AI Assistance Disclosure
We used AI assistants (large language models) for language editing, phrasing, and LaTeX formatting throughout this manuscript. All technical content, mathematical results, experimental design, code, and analyses are the authors’ own work. The AI tools did not contribute novel ideas, proofs, or empirical results.
References
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. Cited by: §6.
- Neural ordinary differential equations. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 31. Cited by: §6.
- A machine learning-based approach to cybersecurity and safety of model predictive control systems. Ph.D. Dissertation, University of California, Los Angeles. Note: Section 2.4 documents the corrected RSCP formulation used in this work. Cited by: §B.2, §B.2, §B.2.
- Bilinear control systems: matrices in action. Springer. Cited by: §6.
- Global bilinearization and reachability analysis of control-affine nonlinear systems. In The Koopman Operator in Systems and Control: Concepts, Methodologies, and Applications, A. Mauroy, I. Mezić, and Y. Susuki (Eds.), pp. 81–98. External Links: ISBN 978-3-030-35713-9, Document Cited by: §5.1.
- Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Cited by: §2.2.
- DeSKO: stability-assured robust control with a deep stochastic Koopman operator. In International Conference on Learning Representations (ICLR), Cited by: §6.
- Deep Koopman learning of nonlinear time-varying systems. Automatica 159, pp. 111372. Cited by: §6.
- Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A 474, pp. 20180335. Cited by: §6.
- Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §C.4.
- Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences 17 (5), pp. 315–318. Cited by: §1, §2.1.
- Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 93, pp. 149–160. Cited by: §1, §2.1, §6.
- Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO, pp. 222–229. Cited by: §2.3.
- Learning compositional Koopman operators for model-based control. In International Conference on Learning Representations (ICLR), Cited by: §6.
- MamKO: Mamba-based Koopman operator for modeling and predictive control. In International Conference on Learning Representations (ICLR), Cited by: §B.1, §B.1, §B.1, §B.2, §B.2, §B.2, §C.2, §D.1, §D.2, §1, §1, §2.2, §5.1, §5.1, §5.1, §5.2, §5.3, Table 1, Table 1, Table 2, Table 2, §6.
- Distributed model predictive control of nonlinear process systems. AIChE Journal 55 (5), pp. 1171–1184. Cited by: §B.2.
- A two-tier architecture for networked process control. Chemical Engineering Science 63 (22), pp. 5394–5409. Cited by: §B.2, §B.2, §5.1.
- Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 9, pp. 4950. Cited by: §1, §2.1.
- Convex optimization for trajectory generation: a tutorial on generating dynamically feasible trajectories reliably and efficiently. IEEE Control Systems Magazine 42 (5), pp. 40–113. Cited by: §6.
- Successive convexification: a superlinearly convergent algorithm for non-convex optimal control problems. arXiv preprint arXiv:1804.06539. Cited by: §2.3, §4.3, §6, Proposition 3.
- Splitting methods. Acta Numerica 11, pp. 341–434. External Links: Document Cited by: §3.3.
- Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics 41, pp. 309–325. Cited by: §1.
- Bilinear control processes. Academic Press. Cited by: §1, §2.3, §6.
- OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation 12, pp. 637–672. Cited by: §D.1, §D.4, §4.3.
- On the product of semi-groups of operators. Proceedings of the American Mathematical Society 10 (4), pp. 545–551. External Links: Document Cited by: §3.3.
- A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 25, pp. 1307–1346. Cited by: §1, §2.1.
Appendix A Stability: Training Penalty and Inference Screening
Training.
The spectral penalty in the loss is
with margin . Eigenvalues are computed via torch.linalg.eigvals cast to float32 to avoid mixed-precision artifacts. Zero initialization of ensures the model starts as an exact copy of the Linear baseline and the penalty is inactive at the start of training.
Inference.
We do not perform a per-step spectral check at inference. Closed-loop stability is enforced indirectly through the SCP trust-region constraint and the QP control bounds, both of which keep the linearized dynamics within the regime where our trained operators are well-conditioned. We did, however, evaluate the cheap Gershgorin disk pre-check () as an stability certificate at each MPC step. Empirically the disks straddle the unit circle in of MPC steps across both bilinear TV cells (CartPole and RSCP, , episodes steps each), because the off-diagonal mass introduced by the bilinear factor inflates the radii past unity even when . A full eigendecomposition would therefore be required for any runtime spectral gate; for this is sub-millisecond per step and does not bottleneck the MPC loop, but we leave runtime stability gating to future work.
Appendix B System Dynamics
This appendix specifies the governing ODEs, parameters, and action spaces of the four system variants (CartPole TI/TV, RSCP TI/TV) used in the paper.
B.1 CartPole (TI, TV)
The state is with measured from the upright. The control is the horizontal force applied to the cart. Following the MamKO benchmark protocol [Li et al., 2025], we use gravity m s-2, cart mass kg, pole mass kg, half-length m, force bound N, and sampling period s. The equations of motion include cart and pole friction terms as in the MamKO reference implementation.
Time-invariant variant (TI).
The friction coefficients are fixed at , (frictionless limit). This recovers the standard frictionless CartPole.
Time-varying variant (TV).
The cart friction is modulated as
with , , and rad s-1, matching the MamKO TV configuration. This is the specific cell used in the closed-loop MPC and lead-time experiments; results for are not reported here.
Note on the sin/cos discrepancy.
The MamKO paper writes (Eq. 16 of Li et al. [2025]); the official MamKO repository uses sin in cartpole_V.py. We follow the repository, which is the executable ground truth.
Parameters.
Table 4 lists the numerical constants used for the CartPole benchmark. The dynamical constants and time-varying friction settings follow the MamKO benchmark protocol [Li et al., 2025]; the shared MamKO architecture hyperparameters for CartPole are summarized separately in Table 6.
| Symbol | Value | Description |
|---|---|---|
| m s-2 | gravitational acceleration | |
| kg | cart mass | |
| kg | pole mass | |
| m | pole half-length | |
| s | sampling period | |
| N | control bound on horizontal force | |
| base cart friction coefficient setting | ||
| pole friction coefficient | ||
| (TV) | rad s-1 | frequency in |
B.2 RSCP: Reactor–Separator Process
RSCP is a reactor–separator process consisting of two continuous stirred-tank reactors (CSTR-1 and CSTR-2) operating in series, feeding a flash separator whose liquid bottoms are partially recycled to CSTR-1 [Liu et al., 2008]. The system was adopted as the MamKO benchmark in Li et al. [2025]. Two parallel first-order reactions proceed in each CSTR with Arrhenius kinetics; the separator performs ideal vapor–liquid equilibrium at fixed pressure with no reaction. The state vector is
where are the mass fractions of species in vessel and is the corresponding temperature in Kelvin; vessel indices refer to the two CSTRs and to the flash separator. The control is , the heat duties applied to each vessel in kJ h-1.
Governing ODEs.
The mass and energy balances follow Chen [2022, Section 2.4] rather than the formulation in the original Liu papers; see “Reproducibility notes” below for the discrepancies that motivated this choice. Letting and denote the total throughput of each CSTR, and the Arrhenius rates, and the heat-of-reaction coefficients (see paragraph below), the governing equations for CSTR-1 and CSTR-2 are
and analogously for CSTR-2 with replacing and the previous-vessel state replacing the recycle stream. The flash separator carries no reaction:
The recycle compositions are determined by ideal vapor-liquid equilibrium with relative volatilities ,
| (16) | ||||
The structurally important property for this paper is that the heat duties enter the energy balances additively as with constant coefficients, contributing no terms.
Parameters.
Table 5 lists the numerical values used. Volumes, flows, feed states, kinetic constants, activation energies, heats of reaction, and relative volatilities are taken from Chen [2022, Table 2.1]; the molar concentration factor kmol m-3 is the constant that converts mass fractions to molar concentrations in the energy balance (see “Reproducibility notes”).
| Symbol | Value | Description |
|---|---|---|
| m3 | vessel volumes | |
| m3 h-1 | feed flows | |
| m3 h-1 | recycle, purge | |
| K | feed temperatures | |
| feed compositions () | ||
| h-1 | kinetic constants | |
| kJ kmol-1 | activation energies | |
| kJ kmol-1 | reaction enthalpies | |
| kg m-3 | density | |
| kJ kg-1 K-1 | specific heat | |
| kmol m-3 | molar concentration factor | |
| kJ kmol-1 | vap. enthalpies | |
| relative volatilities | ||
| kJ kmol-1 K-1 | gas constant |
Reproducibility notes.
We verified that the RSCP ODE system as written in Liu et al. [2008, 2009] does not reproduce the steady state reported by those papers under their stated parameter values. The discrepancy resolves under three corrections drawn from Chen [2022, Section 2.4]: (i) the heat-of-reaction term carries a molar concentration factor rather than , where kmol m-3; (ii) the separator energy balance includes the convective transport of vaporization enthalpies for the three species; (iii) the steady-state heat duties are of order kJ h-1 rather than the value listed in Chen [2022, Table 2.1], which we identified as a typographical exponent error. Under the corrected formulation, forward integration from the published steady state has on temperatures and within on compositions. We use this corrected system throughout. The published MamKO implementation of RSCP is not publicly available, so we cannot directly confirm that our reconstructed system matches theirs; however, the MamKO paper’s reported steady state and our forward-integration check are mutually consistent under the corrected ODEs.
Time-varying variant.
The TV variant follows the MamKO synthetic time-varying modifier [Li et al., 2025, Appendix C.3], applied multiplicatively to both Arrhenius rate terms in all three vessels. This simulates catalyst deactivation: reaction rates decay exponentially with time, and the model must track operating-point drift induced by the slowly shrinking effective kinetics. Crucially, multiplies only the kinetic terms, not the convective or heat-duty terms, so the structural property that does not couple multiplicatively with is preserved in the TV variant: RSCP TV exercises the time-varying-parameter axis of Section 1 without introducing structure.
Nominal operating point and controls.
The MPC tracking task drives the system to the nominal steady state
following Li et al. [2025, Appendix C.3]. The control bounds are kJ h-1 around the nominal heat duties ; these are enforced by the simulator’s action-space clipping rather than by the QP, since the QP operates in normalized control space (see Section D.4). The sampling period is s ( h).
Appendix C Data Generation and Training Protocol
C.1 Trajectory Generation
For each system variant we generate training and test data with the official MamKO ReplayMemory pipeline. Continuous rollouts under per-step i.i.d. uniform random control excitation are segmented into overlapping input–output windows of length . Episodes are integrated by forward Euler at the system sampling period, with no sub-stepping: s for CartPole and s for RSCP. Episodes terminate on physical bound violations or at a fixed horizon, whichever comes first: for CartPole, when or ; for RSCP, when concentration or temperature bounds are violated; for both systems, at steps for training episodes and steps for test episodes. Windows are pooled across episodes until the configured target is reached: training windows, split into train/validation subsets via a random split with seed , and test windows.
Control excitation is drawn independently at each step. For CartPole, the force is sampled as N. For RSCP, each heat duty is sampled as kJ h-1, where denotes the steady-state heating rates. Initial states are sampled at the start of each episode: for CartPole, m and rad, with zero initial velocities; for RSCP, components are drawn independently and uniformly within per-component half-widths around a verified fixed point of the ODEs, with in the state ordering for . Here denotes the numerically verified fixed point of the corrected RSCP ODEs rather than the approximate steady state reported in the MamKO appendix; we use as the perturbation center to avoid systematic drift back toward the true equilibrium during data generation. The data-generation seed for the train/validation split is and is reported alongside the released code.
C.2 Model Architecture
The encoder, backbone, and decoder follow the MamKO configuration [Li et al., 2025] for each system class. We re-use the MamKO reference implementation hyperparameters; only the bilinear extension introduces new parameters ( tensors at full rank , plus the spectral penalty weight ). Table 6 summarizes the per-system settings.
| CartPole | RSCP | |
| Lookback window | ||
| Forecast horizon | ||
| Latent dimension | ||
| Mamba conv kernel | ||
| Hidden dimension | ||
| Bilinear-only | ||
| Bilinear rank | ||
| Spectral penalty | ||
| Discretization | lie_trotter | lie_trotter |
The tensors are factorized as with , both initialized to zero. Zero initialization makes the bilinear model an exact copy of the Linear baseline at training step zero, so any training-time divergence between the two is attributable solely to the bilinear terms.
C.3 Lie–Trotter Discretization Details
The baseline MamKO diagonal step is preserved exactly by evaluating the diagonal flow element-wise:
The associated diagonal input integral is computed element-wise as
| (17) |
In implementation, the numerator is evaluated with torch.expm1 to avoid catastrophic cancellation as , with a small sign-preserving denominator offset used only for numerical stability.
The coupling factor is evaluated with a single torch.linalg.matrix_exp call, yielding the discrete-time matrices
| (18) | ||||
Thus the diagonal ZOH step costs and the coupling step costs , rather than the augmented matrix exponential required by a dense ZOH implementation that jointly recovers and . For , this overhead remains negligible relative to backbone inference.
C.4 Optimization
All models are trained with Adam [Kingma and Ba, 2015] for epochs at initial learning rate , weight decay , batch size , and a step learning-rate schedule (step size epochs, ). Gradient clipping is applied at . Best-validation-loss checkpoints are retained for evaluation. Due to compute constraints, results are reported from a single seed per configuration; per-cell dispersion is reported across the 10 closed-loop episodes and across the final 50 training epochs (mean50 in Table 2). Multi-seed validation is deferred to a follow-up. Training ran in FP32 throughout; mixed precision was not used.
The training loss is the multi-step observation-space prediction loss of Equation 10, with the spectral penalty disabled at inference.
C.5 Hardware and Wall-Clock
Training runs were executed on NVIDIA A100 GPUs on GCE a2-ultragpu-1g instances. Per-run training wall-clock was approximately hours for CartPole and hours for RSCP (bilinear Lie–Trotter models; corresponding linear baselines train roughly faster, at hours per run on either system). Total compute budget for the experiments reported in this paper, including ablations and the lead-time sweep, was approximately GPU-hours (training: h across runs; closed-loop MPC regen=never d-sweeps: h across sweeps).
Appendix D Closed-Loop MPC Tasks
This appendix specifies the closed-loop MPC problems evaluated in Sections 5.2 and 5.5: cost functions, control bounds, warmup protocol, lead-time scheduling, and SCP hyperparameters.
D.1 Common Setup
For all systems we use prediction horizon , apply horizon (standard receding-horizon MPC at ), and episodes per configuration. Each episode is MPC steps. Following MamKO Eq. 10a, the receding-horizon objective at time is
| (19) | ||||
that is, a quadratic tracking term on the predicted decoded state together with a smoothness penalty, plus a terminal tracking cost. The QP is built by dynamics elimination over the -step horizon and solved by OSQP [Stellato et al., 2020].
Warmup.
Following the MamKO evaluation protocol, the lookback buffer is initialized synthetically by repeating the simulator’s reset state times with zero control. MPC begins at simulator step ; no real warmup rollout is performed. This convention matches MamKO’s published evaluation loop [Li et al., 2025] and is the operating point under which the forecasting and closed-loop numbers in the main text are reported.
D.2 CartPole TV
Reference: upright stabilization at under sinusoidal cart friction (Section B.1). Stage and terminal cost weights are
matching the MamKO reference configuration [Li et al., 2025]. The heavy weighting on (state index 3) reflects the priority of pole-angle stabilization over cart-position regulation, and the terminal cost penalizes only cart position. The control bound is N. There is no safety constraint; the trajectory is unconstrained over the action range.
D.3 RSCP (TI, TV)
Reference: setpoint tracking to the nominal steady state of Section B.2. The cost weights heavily emphasize composition tracking over temperature tracking:
The asymmetric weighting follows the MamKO RSCP configuration: the relevant operational quantities are the species mass fractions, while the temperature states float to whatever the heat duties drive them to. The small value reflects the magnitude of the heat duties ( in raw units): balances the contribution against the tracking term. Control bounds are kJ h-1 around the nominal duties; these are enforced as action-space clipping in the simulator after the QP returns its result in normalized control space (see Section D.4).
D.4 SCP Hyperparameters
The SCP controller of Algorithm 1 runs with initial trust-region radius (in normalized control space), shrinkage factor on cost-increase steps, and maximum SCP iterations as reported in Section 5.2. The inner QP is solved by OSQP [Stellato et al., 2020] with default tolerances, warm-started from the previous receding-horizon call’s solution to amortize solver iterations across the closed-loop trajectory. The QP is built in normalized control space using the MamKO instance-normalization statistics output by the dynamics-generation network at each MPC step; control bounds and the reference state are normalized correspondingly. After the QP returns, the optimal control is denormalized and clipped to the simulator’s raw-space action bounds before application.
D.5 Lead-Time Sweep
The lead-time experiment of Section 5.2 sweeps the commitment window on both RSCP TV and CartPole TV with and episodes per . The protocol is implemented by holding both the optimizer’s plan and the backbone’s dynamics output fixed during the lead window: at each MPC call the controller commits to the next controls from a queue, applies the next queued control to the simulator, and refills the queue from the freshly solved plan only after the queue empties; during this window the backbone is not re-evaluated, so the , matrices used for any auxiliary roll-forward are the ones generated at the start of the window. We refer to this as regen=never in our implementation; it isolates the open-loop correction capacity of the model class itself, with neither model receiving intermediate sensor or backbone updates within the lead window. The point is generated but excluded from the analysis in the main text on grounds of discretization-saturation: at the RSCP sampling period of s, corresponds to three minutes of stale plan, beyond the regime in which the underlying linearization is informative (cumulative costs converge for both models, see Table 7). The exact end-of-horizon values reported in Figures 2 and 3 are tabulated in Tables 7 and 8 for RSCP TV and CartPole TV, respectively.
| Linear | ||||
|---|---|---|---|---|
| Bilinear-SCP-5 |
| Linear | ||||
|---|---|---|---|---|
| Bilinear-SCP-5 |
| Cell | Trajectory shape | |
|---|---|---|
| CartPole TI | Peak at ep. , decays | |
| CartPole TV | Monotonic growth, recruitment | |
| RSCP TI | Plateau, slack absorption | |
| RSCP TV | Monotonic decay, multi-context |
Appendix E Coupling Strength Diagnostic
Table 9 reports the learned at training convergence across the four cells, alongside the qualitative shape of the training trajectory. The four cells exhibit distinct training trajectories that together provide an interpretive scaffold for the empirical results in the main text.
The CartPole TI trajectory rises early to , then decays under spectral regularization, settling at . The bilinear capacity is genuinely used—consistent with the strict forecasting gain on this cell—but constrained from over-recruitment.
CartPole TV grows monotonically to . The progressive recruitment matches the time-varying nature of the perturbed dynamics: as training proceeds, the model leans on to absorb friction-coefficient drift that the linear lift cannot track within a single MPC horizon.
RSCP TI plateaus high at , but the closed-loop and forecast results show no clear advantage of the bilinear model on this cell. We interpret this as the optimizer absorbing residual training slack into , which has no physical target on this cell ( is structurally absent in the ODE). alone should not be read as evidence of bilinear strength: its size on RSCP TI is set by training-time slack, not by physics.
RSCP TV decays monotonically from an early peak to . Under time-varying parameters, must serve simultaneously across the family of operating regimes induced by the modulation; the optimizer responds by shrinking toward a setting that works on average rather than overfitting to any single regime. This is consistent with Section 5.2: Bilinear-SCP-, which iteratively re-linearizes the operator at the current operating point, outperforms Bilinear-SCP- on this cell because the average is what the model carries while iterative SCP is what extracts operating-point-specific use of it.
The four trajectories provide an interpretive scaffold rather than a predictive theory: is the empirical witness for the structural condition in Proposition 2 (CartPole TI), the recruitment indicator under TV physics (CartPole TV, RSCP TV), and the slack-absorption diagnostic where no physical target exists (RSCP TI). The proposition does not predict trajectory shape; the trajectories make sense in light of it.
Comments
· 0