arXiv:2605.07052 · eess.SY · uncurated · rendered via ar5iv

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

Title and authors will populate once this paper is indexed.
This paper is rendered from ar5iv. Reproductions and verdicts are not yet available — but you can leave a comment below.
[2605.07052] A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

Boya Hou    Maxim Raginsky This work was supported in part by the NSF under award CCF-2348624 (”Towards a control framework for neural generative modeling”). Boya Hou is with the Carl R. Woese Institute for Genomic Biology and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL, 61801 USA boyahou2@illinois.edu Maxim Raginsky is with Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL, 61801 USA maxim@illinois.edu
Abstract

We generalize Jan Willems’ behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.

I Introduction

The mathematical study of dynamical systems seeks to describe the evolution of a given system over time under its governing laws, initial conditions, and inputs (if any). This evolution can be studied internally through state-space methods or externally via the system’s behavior. Pioneered by Jan Willems in a series of papers starting with [25] (see [26] for an overview), the behavioral approach identifies dynamical systems with sets of their trajectories. For linear time-invariant systems, the cornerstone of this framework is the so-called fundamental lemma [24], which asserts that the set of all trajectories of a controllable linear time-invariant system over a finite time horizon can be reconstructed from a single trajectory driven by a persistently exciting input. This result has become central to recent advances of data-driven control; see, e.g., [14, 13, 6, 8, 23] and references therein.

The philosophy underlying these data-driven approaches, as clearly articulated in the paper by Markovsky and Rapisarda [13], is that simulation or control should be carried out without an explicit system identification step. In the context of simulation of linear systems, for example, this principle is supported by the fact that, given an initial condition and a subsequent input of interest, one can combine it with previous measurements of the system behavior to predict the resulting output without identifying the system transfer function, impulse response, or state-space model. Rather, simulation can be viewed as a “missing data” problem, to be solved directly using the available trajectories.

While this viewpoint has led to a rich and mature literature for linear systems, extending it to nonlinear systems is substantially more challenging, as the trajectory sets no longer form linear subspaces. Several works have explored nonlinear analogues of the fundamental lemma for structured nonlinear systems. Examples include bilinear systems [12], second-order Volterra systems [18], flat nonlinear systems [1], Hammerstein and Wiener systems [3], and a Koopman-type linear embedding posited in [20]. A promising direction is to use a reproducing kernel Hilbert space (RKHS) to lift the trajectories of nonlinear systems into a (possibly infinite-dimensional) feature space. In this vein, Huang, Lygeros, and Dörfler [10] investigated kernelized data-enabled predictive control, where the resulting models are regression-based approximations rather than behavioral characterizations. More recently, Molodchyk and Faulwasser [16] established the link between kernel regression and Willems’ fundamental lemma, and showed that several existing nonlinear extensions of the fundamental lemma, such as those for Hammerstein and flat systems, can be interpreted as instances of kernel regression with specific choices of kernel when the induced RKHS is finite-dimensional.

In this paper, we extend the behavioral framework beyond the linear settings for characterizing the behavior of nonlinear systems in RKHS via minimum-norm interpolation and subspace identification repurposed for nonlinear systems. We reinterpret both minimum-norm interpolation and subspace identification through the behavioral lens, and clarify the role of various structural assumptions on the system, the offline and online data, and the space of functions used to represent the nonlinear aspects of the system. A unifying theme across these two methods is that, like in the modern data-driven schemes and in the spirit of the fundamental lemma, predictors and the desired “online” trajectories can be expressed in terms of the kernelized “offline” examples. When working in an RKHS, as noted by Molodchyk and Faulwasser [16], this structure further mirrors the representer theorem [19]: the solution can be written as a finite linear combination of kernel evaluations along the observed trajectories.

The remainder of this paper is organized as follows. In Section II, we recall the fundamental lemma of Willems for linear time-invariant systems and introduce basic concepts regarding scalar-valued and vector-valued RKHSs. In Section III, we introduce a class of nonlinear systems whose nonlinearity is encoded by an element in the vector-valued RKHS of functions on the set of finite-length sequences of inputs. In Section IV, we establish a Behavior Representer Theorem for nonlinear systems using minimum-norm interpolation and contrast it with the fundamental lemma. Compared with the work of Molodchyk and Faulwasser [16], we consider minimum-norm interpolation instead of (regularized) least-squares regression in a vector-valued RKHS. We further establish an error estimate that holds with equality as in Lemma 4, rather than the inequaliity in [16, Lemma 2]. In Section V, we present a subspace identification framework for nonlinear systems in an RKHS and provide a fundamental lemma characterization of finite-length trajectories with kernelized input vectors and outputs.

II Preliminaries

II-A Notation and definitions

We will make use of the following notation and definitions throughout the paper. We will use +\mathbb{Z}_{+} to denote the set of nonnegative integers. The Moore–Penrose pseudoinverse of a matrix AA will be denoted by AA^{\dagger}. Given the matrices Ap×kA\in\mathbb{R}^{p\times k}, Bq×kB\in\mathbb{R}^{q\times k}, and Cr×kC\in\mathbb{R}^{r\times k}, the oblique projection of the rowspace of AA on the rowspace of CC along the rowspace of BB is defined as

A/𝐵C:=A[CTBT]([CCTCBTBCTBBT])first r columnsC\displaystyle A\underset{B}{/}C:=A\begin{bmatrix}C^{\hbox{\it\tiny T}}&B^{\hbox{\it\tiny T}}\end{bmatrix}\left(\begin{bmatrix}CC^{\hbox{\it\tiny T}}&CB^{\hbox{\it\tiny T}}\\ BC^{\hbox{\it\tiny T}}&BB^{\hbox{\it\tiny T}}\end{bmatrix}^{\dagger}\right)_{\text{first $r$ columns}}C

(see [21, Section 1.4.2]). Given a finite sequence of vectors w0:T1={wt}t=0T1w_{0:T-1}=\{w_{t}\}^{T-1}_{t=0} in q\mathbb{R}^{q}, the Hankel matrix of depth LL is the (Lq)×(TL+1)(Lq)\times(T-L+1) matrix given by

HL(w0:T1):=[w0w1wTLw1w2wTL+1wL1wLwT1].\displaystyle H_{L}(w_{0:T-1}):=\begin{bmatrix}w_{0}&w_{1}&\dots&w_{T-L}\\ w_{1}&w_{2}&\dots&w_{T-L+1}\\ \vdots&\vdots&\ddots&\vdots\\ w_{L-1}&w_{L}&\dots&w_{T-1}\end{bmatrix}.

We say that w0:T1w_{0:T-1} is persistently exciting (PE) of order LL if the Hankel matrix HL(w0:T1)H_{L}(w_{0:T-1}) has full row rank. Given a signal (discrete-time vector-valued sequence) {wt}t+\{w_{t}\}_{t\in\mathbb{Z}_{+}}, σ\sigma is the backward shift operator defined by (σw)t:=wt+1(\sigma w)_{t}:=w_{t+1}. Additional notation will be introduced as needed.

II-B A review of the fundamental lemma

In the behavioral framework, a linear time-invariant system with qq variables is identified with a linear subspace {\cal B} of the space of all one-sided sequences of qq-dimensional real vectors denoted as (q)+(\mathbb{R}^{q})^{\mathbb{Z}_{+}} which is shift-invariant, i.e., σ\sigma{\cal B}\subseteq{\cal B}. One can work with various equivalent representations of {\cal B}, such as autoregressive, input/output, or input/state/output representations [25]. Notions like controllability or observability can be defined solely in terms of set-theoretic properties of {\cal B}, without referring to a particular representation [26].

One of these structural properties pertains to the partition of the variables of {\cal B} into inputs and outputs. Without getting too much into technical details, the idea is that, up to a permutation of coordinates, each trajectory ww\in{\cal B} can be partitioned into an input trajectory u:+mu:\mathbb{Z}_{+}\to\mathbb{R}^{m} and an output trajectory y:+py:\mathbb{Z}_{+}\to\mathbb{R}^{p} as w=[uy]w=\begin{bmatrix}u\\ y\end{bmatrix}, such that the input is free in the sense that

{u:+m|[uy] for some y:+p}(m)+,\displaystyle\begin{aligned} &\left\{u:\mathbb{Z}_{+}\to\mathbb{R}^{m}\,\middle|\,\begin{bmatrix}u\\ y\end{bmatrix}\in{\cal B}\text{ for some }y:\mathbb{Z}_{+}\to\mathbb{R}^{p}\right\}\\ &\qquad\qquad\simeq(\mathbb{R}^{m})^{\mathbb{Z}_{+}}\end{aligned},

and the output is determined by the input, subject to the system laws and the initial condition. It can be shown that the number of inputs mm, the number of outputs pp, and the dimension nn of any minimal state space realization of {\cal B} are system invariants that depend only on {\cal B} and not on the particular representation of {\cal B} [25].

Let {\cal B} be a controllable linear time-invariant system with mm inputs, pp outputs, and minimum state dimension nn. The main result of [24], now commonly referred to as the fundamental lemma, is as follows:

Lemma 1.

For each t=1,2,t=1,2,\dots, let |t{\cal B}|_{t} denote the restriction of {\cal B} to times s{0,,t1}s\in\{0,\dots,t-1\}. Let a trajectory w0:T1d|Tw^{\rm{d}}_{0:T-1}\in{\cal B}|_{T} be given, such that the input part of w0:T1dw^{\rm{d}}_{0:T-1} is persistently exciting of order L+nL+n. Then |L{\cal B}|_{L} is equal to the column space of the Hankel matrix HL(w0:T1d)H_{L}(w^{\rm{d}}_{0:T-1}).

The main message of Lemma 1 is that the length-LL behavior |L{\cal B}|_{L} can be reconstructed, exactly and in a representation-independent manner, from a single input/output trajectory of length TL+(L+n)m1T\geq L+(L+n)m-1, provided the input is persistently exciting and the system is controllable. This makes the fundamental lemma a key ingredient in data-driven approaches to system simulation and control.

II-C Reproducing Kernel Hilbert Spaces

In this paper, we make extensive use of vector-valued reproducing kernel Hilbert spaces. We start by describing the more familiar definition of a scalar-valued RKHS; we refer interested readers to [4] for details. Let XX be a set. A Hilbert space111Unless indicated otherwise, all Hilbert spaces in this paper are assumed to be defined over the reals. {\cal H} with inner product ,\langle\cdot,\cdot\rangle_{\cal H} is an RKHS on XX if its elements are functions from XX to \mathbb{R} and if for each xXx\in X there exists a positive constant CxC_{x}, such that |f(x)|Cxf|f(x)|\leq C_{x}\|f\|_{\cal H} for all ff\in{\cal H}. In other words, {\cal H} is an RKHS on XX if the evaluation functional δx(f):=f(x)\delta_{x}(f):=f(x) is bounded (hence continuous) for all xXx\in X. To each RKHS, we can associate a reproducing kernel, i.e., a mapping κ:X×X\kappa:X\times X\rightarrow\mathbb{R} that satisfies the positivity condition

i,j=1ncicjκ(xi,xj)0,\displaystyle\sum^{n}_{i,j=1}c_{i}c_{j}\kappa(x_{i},x_{j})\geq 0,

for all nn\in\mathbb{N}, c1,,cnc_{1},\dots,c_{n}\in\mathbb{R}, and x1,,xnXx_{1},\dots,x_{n}\in X, such that κ(,x)\kappa(\cdot,x) is an element of {\cal H} for each xXx\in X and the following property (called the reproducing kernel property holds:

f(x)=f,κ(,x),for all f,xX.\displaystyle f(x)=\langle f,\kappa(\cdot,x)\rangle_{\cal H},\qquad\text{for all }f\in{\cal H},x\in X.

Moreover, the set {κ(,x):xX}\{\kappa(\cdot,x):x\in X\} is total in {\cal H} (i.e., its linear span is dense in {\cal H}). The map ϕ:X\phi:X\to{\cal H} defined by ϕ(x):=κ(,x)\phi(x):=\kappa(\cdot,x) is called the canonical feature map. By the Moore–Aronszajn theorem [2], the reproducing kernel κ\kappa specifies {\cal H} uniquely up to linear isomorphism.

We now describe the vector-valued generalization of the above definition [5]. Let 𝒦{\cal K} be a Hilbert space, and let (𝒦){\cal L}({\cal K}) denote the Hilbert space of bounded linear operators on 𝒦{\cal K} with the Hilbert–Schmidt norm. Then we say that {\cal H} is a 𝒦{\cal K}-valued RKHS on XX if its elements are functions from XX to 𝒦{\cal K} and if the evaluation map δx(f):=f(x)\delta_{x}(f):=f(x) from {\cal H} to 𝒦{\cal K} is bounded (hence continuous) for all xXx\in X, i.e., there exists a positive constant CxC_{x} such that f(x)𝒦Cxf\|f(x)\|_{\cal K}\leq C_{x}\|f\|_{\cal H}. The vector-valued analogue of the reproducing kernel is an operator-valued map κ:X×X(𝒦)\kappa:X\times X\to{\cal L}({\cal K}) of the positive type, i.e., such that the inequality

i,j=1ncicjκ(xi,xj)v,v𝒦0,\displaystyle\sum^{n}_{i,j=1}c_{i}c_{j}\langle\kappa(x_{i},x_{j})v,v\rangle_{\cal K}\geq 0,

holds for all nn\in\mathbb{N}, c1,,cnc_{1},\dots,c_{n}\in\mathbb{R}, x1,,xnXx_{1},\dots,x_{n}\in X, and v𝒦v\in{\cal K}. The reproducing kernel property then takes the form

f(x),v𝒦=f,κ(,x)v, for all f,xX,v𝒦\displaystyle\langle f(x),v\rangle_{\cal K}=\langle f,\kappa(\cdot,x)v\rangle_{\cal H},\text{ for all }f\in{\cal H},x\in X,v\in{\cal K}

where, for each xXx\in X, κ(,x)\kappa(\cdot,x) is an element of the linear space (𝒦,){\cal L}({\cal K},{\cal H}) of bounded operators from 𝒦{\cal K} to {\cal H}. The map xκ(,x)x\mapsto\kappa(\cdot,x) from XX into (𝒦,){\cal L}({\cal K},{\cal H}) is the (operator-valued) canonical feature map.

A basic example of a vector-valued RKHS which will be used frequently in the sequel is =(𝒱,𝒲){\cal H}={\cal L}({\cal V},{\cal W}), where 𝒱{\cal V} and 𝒲{\cal W} are finite-dimensional inner-product spaces and where we equip {\cal H} with the Hilbert–Schmidt inner product A,B=Tr(AB)\langle A,B\rangle_{\cal H}={\textrm{Tr}}(A^{*}B). For the sake of completeness, we present the straightforward proof of the following lemma in Appendix A.

Lemma 2.

{\cal H} is a 𝒲{\cal W}-valued reproducing kernel Hilbert space on 𝒱{\cal V} with the operator-valued reproducing kernel κ(v,v)=v,v𝒱I𝒲\kappa\left(v,v^{\prime}\right)=\langle v,v^{\prime}\rangle_{{\cal V}}I_{\cal W}, where I𝒲I_{\cal W} is the identity operator on 𝒲{\cal W}.

III Nonlinear Systems in a Vector-Valued RKHS

We now introduce a class of discrete-time nonlinear systems, where the nonlinearity is represented by an element of a given vector-valued RKHS of functions on the set of finite-length sequences of inputs. We will use the following notation throughout:

  • 𝒰:=m{\cal U}:=\mathbb{R}^{m} is the space of inputs;

  • 𝒴:=p{\cal Y}:=\mathbb{R}^{p} is the space of outputs;

  • 𝖴:=𝒰L+1{\sf{U}}:={\cal U}^{L+1} is the space of length-(L+1)(L+1) input sequences, where L+L\in\mathbb{Z}_{+} is a fixed lag;

  • 𝖸:=𝒴L{\sf{Y}}:={\cal Y}^{L} is the space of length-LL output sequences;

  • 𝒵:=𝖴×𝖸{\cal Z}:=\sf{U}\times\sf{Y}.

We introduce the system model in Section III-A and the associated behavioral constructs in Section III-B, and close by discussing several examples in Section III-C.

III-A System model

Let 𝖴{\cal H}_{\sf{U}} be a vector-valued RKHS of functions from 𝖴\sf{U} into 𝒴{\cal Y}, with the operator-valued kernel κ𝖴\kappa_{\sf{U}} of positive type.222Here and elsewhere, all finite-dimensional vector spaces are automatically treated as Hilbert spaces with the usual Euclidean inner product. We consider systems parametrized by (L+1)(L+1)-tuples (A1,,AL,g)(A_{1},\dots,A_{L},g), where A1,,AL(𝒴)A_{1},\dots,A_{L}\in{\cal L}({\cal Y}) and g𝖴g\in{\cal H}_{\sf{U}}. The input/output relation corresponding to (A1,,AL,g)(A_{1},\dots,A_{L},g) is given by

yt+L+k=1LAkyt+Lk=g(ut:t+L),t+\displaystyle y_{t+L}+\sum^{L}_{k=1}A_{k}y_{t+L-k}=g\left(u_{t:t+L}\right),\qquad t\in\mathbb{Z}_{+} (1)

where ut:t+Lu_{t:t+L} is the restriction of the input sequence u𝒰+u\in{\cal U}^{\mathbb{Z}_{+}} to the set {t,t+1,,t+L}\{t,t+1,\dots,t+L\}. By the reproducing kernel property, we can also write

g(ut:t+L)=κ𝖴(,ut:t+L)g,\displaystyle g(u_{t:t+L})=\kappa_{\sf{U}}(\cdot,u_{t:t+L})^{*}g,

where κ𝖴(,ut:t+L)(𝖴,𝒴)\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)^{*}\in{\cal L}({\cal H}_{\sf{U}},{\cal Y}) is the adjoint of the canonical feature map κ𝖴(,ut:t+L)(𝒴,𝖴)\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)\in{\cal L}({\cal Y},{\cal H}_{\sf{U}}).

It is also convenient to cast (1) in an alternative nonlinear regression form. For each tt, let yt+y_{t^{+}} denote the output yt+Ly_{t+L} at time t+Lt+L and define the regression vectors

zt:=[utT,,ut+LT,ytT,yt+L1T]T𝒵,\displaystyle z_{t}:=[u_{t}^{\hbox{\it\tiny T}},\cdots,u_{t+L}^{\hbox{\it\tiny T}},y_{t}^{\hbox{\it\tiny T}},\cdots y_{t+L-1}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}\in{\cal Z}, (2)

which is in analogy to the corresponding construct for linear systems [17, 9]. As we now show, we can introduce a 𝒴{\cal Y}-valued RKHS 𝒵{\cal H}_{{\cal Z}} of functions from 𝒵{\cal Z} into 𝒴{\cal Y}, such that the autoregressive nonlinear model of (1) can be represented as

yt+=f(zt),\displaystyle y_{t^{+}}=f(z_{t}), (3)

for some f𝒵f\in{\cal H}_{{\cal Z}} that depends on (A1,,AL,g)(A_{1},\dots,A_{L},g).

To that end, let us first express (1) as

yt+L=Ayt:t+L1+g(ut:t+L)=:f(ut:t+L,yt:t+L1),\displaystyle\begin{aligned} y_{t+L}&=A^{-}y_{t:t+L-1}+g\left(u_{t:t+L}\right)\\ &=:f\left(u_{t:t+L},y_{t:t+L-1}\right),\end{aligned} (4)

where A(𝖸,𝒴)A^{-}\in{\cal L}({\sf{Y}},{\cal Y}) is the linear operator defined by

Ay¯:=[A1A2AL]y¯,y¯𝖸.\displaystyle A^{-}\overline{y}:=-\begin{bmatrix}A_{1}&A_{2}&\dots&A_{L}\end{bmatrix}\overline{y},\qquad\overline{y}\in{\sf{Y}}.

By Lemma 2, 𝖸=(𝖸,𝒴){\cal H}_{\sf{Y}}={\cal L}({\sf{Y}},{\cal Y}) is a 𝒴{\cal Y}-valued RKHS on 𝖸{\sf{Y}} with the reproducing kernel κ𝖸(y¯,y¯)=(y¯Ty¯)I𝒴\kappa_{\sf{Y}}\left(\overline{y},\overline{y}^{\prime}\right)=(\overline{y}^{\hbox{\it\tiny T}}\overline{y}^{\prime})I_{{\cal Y}}. Consider the direct-sum Hilbert space 𝒵=𝖴𝖸{\cal H}_{{\cal Z}}={\cal H}_{\sf{U}}\oplus{\cal H}_{\sf{Y}}, whose elements f:𝒵𝒴f:{{\cal Z}}\to{\cal Y} are pairs (g,h)(g,h) where g𝖴g\in{\cal H}_{\sf{U}}, h𝖸h\in{\cal H}_{\sf{Y}}, and for each z=(u¯,y¯)𝒵z=(\overline{u},\overline{y})\in{\cal Z}, f(z)=g(u¯)+h(y¯)f(z)=g(\overline{u})+h(\overline{y}). The inner product in 𝒵{\cal H}_{{\cal Z}} is

f1,f2𝒵=g1,g2𝖴+h1,h2𝖸,\displaystyle\left\langle f_{1},f_{2}\right\rangle_{{\cal H}_{{\cal Z}}}=\left\langle g_{1},g_{2}\right\rangle_{{\cal H}_{\sf{U}}}+\left\langle h_{1},h_{2}\right\rangle_{{\cal H}_{\sf{Y}}},

for f1=(g1,h1)f_{1}=(g_{1},h_{1}), f2=(g2,h2)f_{2}=(g_{2},h_{2}). Since 𝖴{\cal H}_{\sf{U}}, 𝖸{\cal H}_{\sf{Y}} are 𝒴{\cal Y}-valued RKHSs with operator-valued kernels κ𝖴\kappa_{\sf{U}} and κ𝖸\kappa_{\sf{Y}}, 𝒵{\cal H}_{{\cal Z}} is also an RKHS with operator-valued kernel defined by

κ𝒵(z1,z2)=κ𝖴(u¯1,u¯2)+κ𝖸(y¯1,y¯2),z1=(u¯1,y¯1),z2=(u¯2,y¯2)𝒵.\displaystyle\begin{split}\kappa_{{\cal Z}}(z_{1},z_{2})&=\kappa_{\sf{U}}(\overline{u}_{1},\overline{u}_{2})+\kappa_{\sf{Y}}\left(\overline{y}_{1},\overline{y}_{2}\right),\\ &\qquad z_{1}=\left(\overline{u}_{1},\overline{y}_{1}\right),z_{2}=\left(\overline{u}_{2},\overline{y}_{2}\right)\in{\cal Z}.\end{split}

This can be proved as follows: for any f𝒵f\in{\cal H}_{{\cal Z}} and v𝒴v\in{\cal Y}, we have

f(z),v𝒴=g(z),v𝒴+h(z),v𝒴=g,κ𝖴(,u¯)v𝖴+h,κ𝖸(,y¯)v𝖸=f,(κ𝖴(,u¯)+κ𝖸(,y¯))v𝒵,\displaystyle\begin{aligned} \left\langle f(z),v\right\rangle_{{\cal Y}}=&\left\langle g(z),v\right\rangle_{{\cal Y}}+\left\langle h(z),v\right\rangle_{{\cal Y}}\\ =&\left\langle g,\kappa_{{\sf{U}}}(\cdot,\overline{u})v\right\rangle_{{\cal H}_{\sf{U}}}+\left\langle h,\kappa_{{\sf{Y}}}(\cdot,\overline{y})v\right\rangle_{{\cal H}_{\sf{Y}}}\\ =&\left\langle f,\left(\kappa_{{\sf{U}}}(\cdot,\overline{u})+\kappa_{{\sf{Y}}}(\cdot,\overline{y})\right)v\right\rangle_{{\cal H}_{{\cal Z}}},\end{aligned}

where the second line follows from the reproducing kernel property. In particular, (3) holds with the function ff defined in (4).

III-B Behaviors

In this section, we introduce several behavioral constructs related to the system model of Section III-A.

Given (A1,,AL,g)(A_{1},\dots,A_{L},g), define the operator P(σ):=A0σL+k=1LAkσLkP(\sigma):=A_{0}\sigma^{L}+\sum^{L}_{k=1}A_{k}\sigma^{L-k} (with A0=I𝒴A_{0}=I_{\cal Y}), where σ\sigma is the shift operator acting on output sequences. Then (1) is equivalent to

(P(σ)y)(t)=κ𝖴(,ut:t+L)g,t+,\displaystyle\left(P(\sigma)y\right)(t)=\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)^{*}g,\qquad t\in\mathbb{Z}_{+}, (5)

and we can define the behavior of (A1,,AL,g)(A_{1},\dots,A_{L},g) as the following subset of the space of all input/output sequences:

(A1,,AL,g)(P,g):={[uy](𝒰×𝒴)+|(5) holds for all t+}.\displaystyle\begin{aligned} &{\cal B}(A_{1},\dots,A_{L},g)\\ &\equiv{\cal B}(P,g)\\ &:=\left\{\begin{bmatrix}u\\ y\end{bmatrix}\in({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\,\middle|\,\eqref{eq:NL.AR.Poly}\text{ holds for all }t\in\mathbb{Z}_{+}\right\}.\end{aligned} (6)

For t+t\in\mathbb{Z}_{+}, the restriction of the behavior (P,g){\cal B}(P,g) to the finite time interval {t,,t+L}\{t,\dots,t+{L}\} is defined by

(P,g)|t:t+L={[ut:t+Lyt:t+L]|[u¯y¯](P,g) such that [u¯t:t+Ly¯t:t+L]=[ut:t+Lyt:t+L]}.\displaystyle\begin{split}{\cal B}(P,g)|_{t:t+{L}}=\left\{\begin{bmatrix}u_{t:t+L}\\ y_{t:t+L}\end{bmatrix}\,\middle|\,\exists\begin{bmatrix}\bar{u}\\ \bar{y}\end{bmatrix}\in{\cal B}(P,g)\right.\\ \left.\text{ such that }\begin{bmatrix}\bar{u}_{t:t+{L}}\\ \bar{y}_{t:t+{L}}\end{bmatrix}=\begin{bmatrix}u_{t:t+L}\\ y_{t:t+L}\end{bmatrix}\right\}.\end{split}

When t=0t=0, we will use (P,g)|L{\cal B}(P,g)|_{L} as shorthand for (P,g)|t:t+L{\cal B}(P,g)|_{t:t+{L}}.

Next, we turn to the nonlinear regression form in (3). Recall that the identity

f(z)=κ𝒵(,z)f\displaystyle f(z)=\kappa_{{\cal Z}}(\cdot,z)^{*}f

holds by the reproducing kernel property. Define the operator R:(𝒰×𝒴)+(𝒵,𝒴)×𝒴R:({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\to{\cal L}({\cal H}_{\cal Z},{\cal Y})\times{\cal Y} by

R[uy]:=(κ𝒵(,z0),y0+),\displaystyle R\begin{bmatrix}u\\ y\end{bmatrix}:=(\kappa_{{\cal Z}}(\cdot,z_{0})^{*},y_{0^{+}}),

where

z0=[u0T,,uLT,y0T,yL1T]T,y0+=yL,\displaystyle z_{0}=[u_{0}^{\hbox{\it\tiny T}},\cdots,u_{L}^{\hbox{\it\tiny T}},y_{0}^{\hbox{\it\tiny T}},\cdots y_{L-1}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}},\quad y_{0^{+}}=y_{L}, (7)

is the regression vector at time t=0t=0 [cf. Eq. (2)] and y0+y_{0^{+}} is the output at time t=Lt=L. We say that (H,y)(𝒵,𝒴)×𝒴(H,y)\in{\cal L}\left({\cal H}_{\cal Z},{\cal Y}\right)\times{\cal Y} is an input-output (i/o) pair of the system (3) if Hf=yHf=y. This leads naturally to the following behavioral description:

(f):={[uy](𝒰×𝒴)+|Rσt[uy] is an i/o pair of (3) for each t+}.\displaystyle\begin{aligned} {\cal B}(f)&:=\Bigg\{\begin{bmatrix}u\\ y\end{bmatrix}\in({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\Bigg|R\sigma^{t}\begin{bmatrix}u\\ y\end{bmatrix}\\ &\quad\text{ is an i/o pair of \eqref{eq.NL.AR.f} for each $t\in\mathbb{Z}_{+}$}\Bigg\}.\end{aligned}

When (A1,,AL,g)(A_{1},\dots,A_{L},g) and ff are related via (3), it is easily verified that

(A1,,AL,g)|L={[u0:Ly0:L]|κ𝒵(,z0)f=y0+},\displaystyle{\cal B}(A_{1},\dots,A_{L},g)|_{L}=\left\{\begin{bmatrix}u_{0:L}\\ y_{0:L}\end{bmatrix}\,\middle|\,\kappa_{\cal Z}(\cdot,z_{0})^{*}f=y_{0^{+}}\right\}, (8)

where z0z_{0} and y0+y_{0^{+}} are defined in (7).

III-C Examples

III-C1 LTI Systems

Let 𝖴{\cal H}_{\sf{U}} consist of all linear operators from 𝖴\sf{U} into 𝒴{\cal Y}, i.e., 𝖴=(𝖴,𝒴){\cal H}_{\sf{U}}={\cal L}(\sf{U},{\cal Y}). By Lemma 2, this is a 𝒴{\cal Y}-valued RKHS on 𝖴\sf{U}. Any g𝖴g\in{\cal H}_{\sf{U}} can be represented as g(u0:L)=BLu0++B0uLg(u_{0:L})=B_{L}u_{0}+\cdots+B_{0}u_{L} for some linear operators B0,,BL(𝒰,𝒴)B_{0},\dots,B_{L}\in{\cal L}({\cal U},{\cal Y}). Then Eq. (1) describes a linear autoregressive model of order L{L}:

yt+L+k=1LAkyt+Lk=l=0LBlut+Ll.\displaystyle y_{t+L}+\sum^{L}_{k=1}A_{k}y_{t+L-k}=\sum^{L}_{l=0}B_{l}u_{t+L-l}.

The operator-valued kernel map is given by

κ𝖴(u0:L,v0:L)=u0:L,v0:L𝖴I𝒴,\displaystyle\kappa_{\sf{U}}(u_{0:L},v_{0:L})=\langle u_{0:L},v_{0:L}\rangle_{\sf{U}}I_{\cal Y},

where

u0:L,v0:L𝖴=l=0LulTvl,\displaystyle\langle u_{0:L},v_{0:L}\rangle_{\sf{U}}=\sum^{L}_{l=0}u_{l}^{\hbox{\it\tiny T}}v_{l},

is the l2l^{2} inner product on 𝖴{\sf{U}} viewed as a direct sum of L+1L+1 Hilbert spaces 𝒰=m{\cal U}=\mathbb{R}^{m} with the Euclidean inner product u,v=uTv\langle u,v\rangle=u^{\hbox{\it\tiny T}}v.

III-C2 Volterra series

When Ak=0A_{k}=0 for all k=1,,Lk=1,\cdots,{L}, the nonlinear autoregressive model in (1) reduces to

yt+L=g(ut:t+L),t.\displaystyle y_{t+L}=g(u_{t:t+L}),\quad\forall t\in\mathbb{N}. (9)

This system has finite memory of length L+1L+1, since the output at each time tLt\geq L is determined by the inputs in the finite time window {t,t1,,tL}\{t,t-1,\dots,t-L\} of length L+1L+1. Systems of this type are often represented using Volterra series. De Figueiredo and Dwyer [7] used the formalism of weighted Fock spaces to connect Volterra series representation to the RKHS framework.

Let us assume for simplicity that p=1p=1 (i.e., the outputs yty_{t} are scalar). Let ρ=(ρk)k0\rho=(\rho_{k})_{k\geq 0} be a sequence of positive reals, such that the infinite series

q(λ)=k=01k!ρkλk,\displaystyle q(\lambda)=\sum^{\infty}_{k=0}\frac{1}{k!\rho_{k}}\lambda^{k},

converges for all λ\lambda\in\mathbb{R}, and let 𝖴{\cal H}_{\sf{U}} denote the RKHS of functions from 𝖴\sf{U} into \mathbb{R} with the reproducing kernel

κ𝖴(u0:L,v0:L):=q(u0:L,v0:L𝖴).\displaystyle\kappa_{\sf{U}}(u_{0:L},v_{0:L}):=q(\langle u_{0:L},v_{0:L}\rangle_{{\sf{U}}}).

Let h0h_{0} and (hk(i1,,ik):k,i1,,ik{0,,L})(h_{k}(i_{1},\dots,i_{k}):k\in\mathbb{N},i_{1},\dots,i_{k}\in\{0,\dots,L\}) be a collection of real coefficients satisfying the condition

k=0ρkk!hkk2<,\displaystyle\sum^{\infty}_{k=0}\frac{\rho_{k}}{k!}\|h_{k}\|^{2}_{k}<\infty,

where h00:=|h0|\|h_{0}\|_{0}:=|h_{0}| and

hkk:=(i1=0Lik=1L|hk(i1,,ik)|2)1/2\displaystyle\|h_{k}\|_{k}:=\left(\sum^{L}_{i_{1}=0}\dots\sum^{L}_{i_{k}=1}|h_{k}(i_{1},\dots,i_{k})|^{2}\right)^{1/2}

for k=1,2,k=1,2,\dots. Then any function g:𝖴g:\sf{U}\to\mathbb{R} of the form

g(u0:L)=h0+k=1i1,,ik{0,,L}hk(i1,,ik)j=1kuij\displaystyle g(u_{0:L})=h_{0}+\sum^{\infty}_{k=1}\sum_{i_{1},\dots,i_{k}\in\{0,\dots,L\}}h_{k}(i_{1},\dots,i_{k})\prod^{k}_{j=1}u_{i_{j}}

is an element of 𝖴{\cal H}_{\sf{U}} [7].

Removing the restriction A1==AL=0A_{1}=\dots=A_{L}=0 yields a broader class of nonlinear systems that subsumes the models studied in [7].

III-C3 State-space models

Consider a Hammerstein state-space model of the form

xt+1=Axt+Bψ1(ut),yt=Cxt+Dψ2(ut),\displaystyle\begin{aligned} x_{t+1}&=Ax_{t}+B\psi_{1}\left(u_{t}\right),\\ y_{t}&=Cx_{t}+D\psi_{2}\left(u_{t}\right),\end{aligned} (10)

where ψ1,ψ2:𝒰q\psi_{1},\psi_{2}:{\cal U}\to\mathbb{R}^{q} are two (nonlinear) functions. We assume that the state xtx_{t} takes values in n\mathbb{R}^{n}, so An×nA\in\mathbb{R}^{n\times n}, Bn×qB\in\mathbb{R}^{n\times q}, Cp×nC\in\mathbb{R}^{p\times n}, and Dp×qD\in\mathbb{R}^{p\times q}. We can obtain an input-output representation (1) from the state-space representation (10) as follows.

Introduce the Markov parameters Mj:=CAjBp×qM_{j}:=CA^{j}B\in\mathbb{R}^{p\times q} for j0j\geq 0. Define the LL-step controllablity matrix 𝒞L{\cal C}_{L}, the LL-step observability matrix 𝒪L{\cal O}_{L}, the reversed LL-step controllability matrix ΔL\Delta_{L}, and the modified block Toeplitz operator 𝒯~L\widetilde{{\cal T}}_{L} as

𝒞L:=[BABA2BAL1B],𝒪L:=[CCACAL1],ΔL:=[AL1BABB],𝒯~L=[0000M0000M1M000ML2ML2M00].\displaystyle\begin{aligned} {\cal C}_{L}:=&\begin{bmatrix}B&AB&A^{2}B&\cdots&A^{{L-1}}B\end{bmatrix},\\ {\cal O}_{L}:=&\begin{bmatrix}C\\ CA\\ \vdots\\ CA^{{L-1}}\end{bmatrix},\\ \Delta_{L}:=&\begin{bmatrix}A^{{L-1}}B&\cdots&AB&B\end{bmatrix},\\ \widetilde{{\cal T}}_{L}=&\begin{bmatrix}{0}&{0}&\cdots&{0}&{0}\\ M_{0}&{0}&\cdots&{0}&{0}\\ M_{1}&M_{0}&\cdots&{0}&{0}\\ \vdots&\vdots&\cdots&\vdots&\vdots\\ M_{L-2}&M_{L-2}&\cdots&M_{0}&{0}\end{bmatrix}.\end{aligned} (11)

Let us assume that LnL\geq n is such that rank(𝒪L)=n{\rm rank}({\cal O}_{L})=n. Then from (10), for k=0,,Lk=0,\cdots,{L}, we have

ytL+k=CAkxtL+j=0k1Mjψ1(utL+k1j)+Dψ2(utL+k).\displaystyle\begin{aligned} y_{t-L+k}=&CA^{k}x_{t-L}+\sum_{j=0}^{k-1}M_{j}\psi_{1}(u_{t-L+k-1-j})\\ &+D\psi_{2}(u_{t-L+k}).\end{aligned} (12)

Define the vectors

Yt\displaystyle Y_{t} :=[ytLyt1]pL,\displaystyle:=\begin{bmatrix}y_{t-L}\\ \vdots\\ y_{t-1}\end{bmatrix}\in\mathbb{R}^{pL},
Et\displaystyle E_{t} :=[ψ1(utL)ψ1(ut1)]qL,\displaystyle:=\begin{bmatrix}\psi_{1}(u_{t-L})\\ \vdots\\ \psi_{1}(u_{t-1})\end{bmatrix}\in\mathbb{R}^{qL},
Ft\displaystyle F_{t} :=[ψ2(utL)ψ2(ut1)]qL.\displaystyle:=\begin{bmatrix}\psi_{2}(u_{t-L})\\ \vdots\\ \psi_{2}(u_{t-1})\end{bmatrix}\in\mathbb{R}^{qL}.

Then, stacking the equations in (12) for k=0k=0 to k=L1k=L-1, we have

Yt=𝒪LxtL+𝒯~LEt+(ILD)Ft.\displaystyle Y_{t}={\cal O}_{L}x_{t-L}+\widetilde{{\cal T}}_{L}E_{t}+\left(I_{L}\otimes D\right)F_{t}.

Since rank(𝒪L)=n{\rm rank}({\cal O}_{L})=n, we can solve for xtLx_{t-L}:

xtL=𝒪L(Yt𝒯~LEt(ILD)Ft).\displaystyle x_{t-L}={\cal O}_{L}^{\dagger}\left(Y_{t}-\widetilde{{\cal T}}_{L}E_{t}-\left(I_{L}\otimes D\right)F_{t}\right). (13)

On the other hand, when k=Lk={L}, we have

yt=CALxtL+j=0L1Mjψ1(ut1j)+Dψ2(ut).\displaystyle\begin{aligned} y_{t}&=CA^{L}x_{t-L}+\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})+D\psi_{2}(u_{t}).\end{aligned}

Plugging in the expression for xtLx_{t-L} from (13),

yt=CAL𝒪L(Yt𝒯~LEt(ILD)Ft)+j=0L1Mjψ1(ut1j)+Dψ2(ut).\displaystyle\begin{aligned} y_{t}&=CA^{L}{\cal O}_{L}^{\dagger}\left(Y_{t}-\widetilde{{\cal T}}_{L}E_{t}-\left(I_{L}\otimes D\right)F_{t}\right)\\ &\qquad+\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})+D\psi_{2}(u_{t}).\end{aligned}

The matrix Q:=CAL𝒪Lp×pLQ:=CA^{L}{\cal O}_{L}^{\dagger}\in\mathbb{R}^{p\times pL} can be written in block form as Q=[QL1,,Q0]Q=[Q_{L-1},\cdots,Q_{0}] with Qip×pQ_{i}\in\mathbb{R}^{p\times p} for i=0,,L1i=0,\cdots,{L-1}. Setting Ak:=QkA_{k}:=-Q_{k} for k=1,,L1k=1,\cdots,L-1 and moving them to the LHS of the above equation, we have

yt+k=1LAkytk=j=0L1Mjψ1(ut1j)Q𝒯~LEtQ(ILD)Ft+Dψ2(ut).\displaystyle\begin{aligned} y_{t}+\sum_{k=1}^{L}A_{k}y_{t-k}=&\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})-Q\widetilde{{\cal T}}_{L}E_{t}\\ &-Q\left(I_{L}\otimes D\right)F_{t}+D\psi_{2}(u_{t}).\end{aligned}

With M=[ML1,M0]p×mLM=[M_{{L-1}},\cdots M_{0}]\in\mathbb{R}^{p\times mL}, we can write the RHS as a function of on utL:tu_{t-L:t} as

yt+k=1LAkytk=[MQ𝒯~LQ(ILD)D]=:S[EtFtψ2(ut)]=:ψ(utL:t)=:g(utL:t),\displaystyle\begin{aligned} &y_{t}+\sum_{k=1}^{L}A_{k}y_{t-k}\\ &=\underbrace{\begin{bmatrix}M-Q\widetilde{{\cal T}}_{L}&-Q\left(I_{L}\otimes D\right)&D\end{bmatrix}}_{=:S}\underbrace{\begin{bmatrix}E_{t}\\ F_{t}\\ \psi_{2}(u_{t})\end{bmatrix}}_{=:\psi(u_{t-L:t})}\\ &=:g(u_{t-L:t}),\end{aligned}

where Sp×q(2L+1)S\in\mathbb{R}^{p\times q(2L+1)}, ψ(utL:t)q(2L+1)\psi(u_{t-L:t})\in\mathbb{R}^{q(2L+1)}, and gg defined above is a mapping from 𝖴\sf{U} into 𝒴=p{\cal Y}=\mathbb{R}^{p}. Assume now that ψ1,ψ2\psi_{1},\psi_{2} are elements of some q\mathbb{R}^{q}-valued RKHS {\cal H} on 𝒰{\cal U}. Thus, by definition, the evaluation maps ψ1ψ1(u)\psi_{1}\mapsto\psi_{1}(u) and ψ2ψ2(u)\psi_{2}\mapsto\psi_{2}(u) are bounded for each u𝒰u\in{\cal U}. Consider the direct-sum Hilbert space 𝒦:=(2L+1){\cal K}:={\cal H}^{\oplus(2L+1)} with inner product h,h(2L+1)=j=02Lhj,hj\langle h,h^{\prime}\rangle_{{\cal H}^{\oplus(2L+1)}}=\sum_{j=0}^{2L}\langle h_{j},h^{\prime}_{j}\rangle_{{\cal H}} for h=h0h2Lh=h_{0}\oplus\cdots\oplus h_{2L}. The function gg defined above is an element of the linear space {\cal F} of functions f:𝖴𝒴f:\sf{U}\to{\cal Y} of the form

f(u0:L)=S0h0(u0)+j=1LSjhj(uj)+j=L+12LSjhj(ujL),\displaystyle f(u_{0:L})=S_{0}h_{0}(u_{0})+\sum^{L}_{j=1}S_{j}h_{j}(u_{j})+\sum^{2L}_{j=L+1}S_{j}h_{j}(u_{j-L}),

where h=h0h2Lh=h_{0}\oplus\dots\oplus h_{2L} ranges over 𝒦{\cal K} and where S0,S1,,S2LS_{0},S_{1},\dots,S_{2L} are linear operators from q\mathbb{R}^{q} into 𝒴=p{\cal Y}=\mathbb{R}^{p}. For each u0:Lu_{0:L}, the evaluation map ff(u0:L)f\mapsto f(u_{0:L}) is continuous on {\cal F}. Thus, gg belongs to some 𝒴{\cal Y}-valued RKHS on 𝖴\sf{U}.

IV Behavior Representer Theorem Via Minimum Norm Interpolation

Let (utd,ytd)t=0T+L1(u_{t}^{\rm{d}},y_{t}^{\rm{d}})_{t=0}^{T+L-1} be a finite input-output trajectory generated by an unknown system of the form (1). In other words, there exists an unknown (L+1)(L+1)-tuple (A1,,AL,g)(A_{1},\dots,A_{L},g), such that

[u0:T+L1dy0:T+L1d](A1,,AL,g)|T+L1.\displaystyle\begin{bmatrix}u^{\rm{d}}_{0:T+L-1}\\ y^{\rm{d}}_{0:T+L-1}\end{bmatrix}\in{\cal B}(A_{1},\dots,A_{L},g)|_{T+L-1}.

The problem of data-driven behavioral modeling is to reconstruct the unknown length-LL behavior segment (A1,,AL,g)|L{\cal B}(A_{1},\dots,A_{L},g)|_{L} from the data without explicitly estimating A1,,AL,gA_{1},\dots,A_{L},g.

Let ff be the regression representation of (A1,,AL,g)(A_{1},\dots,A_{L},g) as in (3). Using the definitions of ztz_{t} in (2) and yt+:=yt+Ly_{t^{+}}:=y_{t+L}, we can represent the data (utd,ytd)t=0T+L1(u^{\rm{d}}_{t},y^{\rm{d}}_{t})^{T+L-1}_{t=0} equivalently by (ztd,yt+d)t=0T1(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})^{T-1}_{t=0}, such that f(ztd)=yt+f(z^{\rm{d}}_{t})=y_{t^{+}} holds for all t=0,,T1t=0,\dots,T-1. In other words, for each t=0,,T1t=0,\dots,T-1, (κ𝒵(,ztd),yt+d)(\kappa_{\cal Z}(\cdot,z^{\rm{d}}_{t})^{*},y^{\rm{d}}_{t^{+}}) is a valid i/o pair for (3). In view of the equivalence (8), the question we would like to answer is whether we can reconstruct the set of all valid i/o pairs of the unknown ff from the given data.

The following result is easy to establish:

Theorem 3.

For any collection of real coefficients c0,,cT1c_{0},\dots,c_{T-1}, (j=0T1cjκ𝒵(,zjd),j=0T1cjyj+d)\left(\sum_{j=0}^{T-1}c_{j}\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})^{*},\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}}\right) is a valid i/o pair for the model (3).

Theorem 3 shows that any linear combination of (κ𝒵(,ztd),yt+d)t=0T1\big(\kappa_{\cal Z}(\cdot,z^{\rm{d}}_{t})^{*},y^{\rm{d}}_{t^{+}}\big)^{T-1}_{t=0} is a valid i/o pair of (3). We now analyze the reverse direction by relating it to minimum-norm interpolation in a vector-valued RKHS [15]. Let finite input-output data {(zi,yi)}i=0N1𝒵×𝒴\{(z_{i},y_{i})\}^{N-1}_{i=0}\subset{\cal Z}\times{\cal Y} be given and consider the following problem:

minimize f𝒵2subject to f(zj)=yj,j=0,,N1.\displaystyle\begin{aligned} \text{minimize }&\left\|f\right\|_{{\cal H}_{\cal Z}}^{2}\\ \text{subject to }&\ f\left(z_{j}\right)=y_{j},\quad j=0,\cdots,{N-1}.\end{aligned} (14)

Define the sampling operator SN:𝒵𝒴NS_{N}:{\cal H}_{\cal Z}\to{\cal Y}^{N} by SNf=(f(z0),,f(zN1))S_{N}f=\left(f(z_{0}),\cdots,f(z_{N-1})\right). By [15, Theorem 3], if (y0,,yN1)range(SN)(y_{0},\dots,y_{N-1})\in\text{range}(S_{N}), then the minimum norm interpolation problem in (14) has a unique solution given by

fN=j=0N1κ𝒵(,zj)vj,\displaystyle f_{N}=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})v_{j}, (15)

where (v0,,vN1)𝒴N(v_{0},\dots,v_{N-1})\in{\cal Y}^{N} solves the system of equations

l=0N1κ𝒵(zj,zl)vl=yj,j=0,,N1.\displaystyle\sum_{l=0}^{N-1}\kappa_{\cal Z}(z_{j},z_{l})v_{l}=y_{j},\qquad j=0,\dots,N-1. (16)

We can express this more succinctly as follows.

Since 𝒴=p{\cal Y}=\mathbb{R}^{p}, for each z𝒵z\in{\cal Z} we can view κ𝒵(,z)\kappa_{\cal Z}(\cdot,z) as a mapping from p\mathbb{R}^{p} into 𝒵{\cal H}_{\cal Z}, i.e., for each vpv\in\mathbb{R}^{p}, κ𝒵(,z)v𝒵\kappa_{\cal Z}(\cdot,z)v\in{\cal H}_{\cal Z}. Define the mapping ΦN:pN𝒵\Phi_{N}:\mathbb{R}^{pN}\to{\cal H}_{\cal Z} as

ΦNv¯=j=0N1κ𝒵(,zj)vj,\displaystyle\Phi_{N}\bar{v}=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})v_{j},

where v¯=(v0,,vN1)pN\bar{v}=(v_{0},\cdots,v_{N-1})\in\mathbb{R}^{pN}. In particular, fN=ΦNv¯f_{N}=\Phi_{N}\bar{v}, where v¯\bar{v} is the solution of (16). Next, define the block kernel matrix KN:=ΦNΦNpN×pNK_{N}:=\Phi_{N}^{*}\Phi_{N}\in\mathbb{R}^{pN\times pN} whose blocks are given by [KN]ij=κ𝒵(zi,zj)[K_{N}]_{ij}=\kappa_{\cal Z}(z_{i},z_{j}) for i,j=0,,N1i,j=0,\cdots,N-1, as well as the block row vector kN(z):=κ𝒵(,z)ΦNp×pNk_{N}(z):=\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\in\mathbb{R}^{p\times pN} with blocks [kN(z)]j=κ𝒵(z,zj)[k_{N}(z)]_{j}=\kappa_{\cal Z}(z,z_{j}) for j=0,,N1j=0,\cdots,N-1. Using the reproducing property and the above definitions, we can write

fN(z)\displaystyle f_{N}(z) =κ𝒵(,z)fN\displaystyle=\kappa_{\cal Z}(\cdot,z)^{*}f_{N}
=κ𝒵(,z)ΦNKNyN\displaystyle=\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}K^{\dagger}_{N}y_{N}
=kN(z)KNYN,\displaystyle=k_{N}(z)K^{\dagger}_{N}Y_{N}, (17)

where YN:=[y0T,,yN1T]TY_{N}:=[y^{\hbox{\it\tiny T}}_{0},\dots,y^{\hbox{\it\tiny T}}_{N-1}]^{\hbox{\it\tiny T}}. Finally, for each zz define ΣN(z)p×p\Sigma_{N}(z)\in\mathbb{R}^{p\times p} as

ΣN(z):=κ𝒵(z,z)kN(z)KNkN(z).\displaystyle\Sigma_{N}(z):=\kappa_{\cal Z}(z,z)-k_{N}(z)K_{N}^{\dagger}{k}_{N}(z)^{*}. (18)

We now present two key lemmas. The first one is a structural characterization of ΣN\Sigma_{N}:

Lemma 4.

For each zz, the operator ΣN(z)\Sigma_{N}(z) is symmetric and positive semi-definite, and ΣN(z)=0\Sigma_{N}(z)=0 iff range(κ𝒵(,z))range(ΦN){\rm range}\left(\kappa_{\cal Z}(\cdot,z)\right)\subseteq{{\rm range}(\Phi_{N})}.

The second one is an extension of a result of Liang and Recht [11, Lemma 1] to the minimum-norm interpolation problem in the vector-valued RKHS 𝒵{\cal H}_{\cal Z}:

Lemma 5.

Suppose that the problem (14) admits unique solutions fNf_{N} and fN+1f_{N+1} given the respective data {(zi,yi)}i=0N1\{(z_{i},y_{i})\}^{N-1}_{i=0} and {(zi,yi)}i=0N1{(zN,yN)}\{(z_{i},y_{i})\}^{N-1}_{i=0}\cup\{(z_{N},y_{N})\}. Then the following holds:

  1. 1.

    If ΣN(zN)=0\Sigma_{N}(z_{N})=0, then yNfN(zN)=0y_{N}-f_{N}\left(z_{N}\right)=0.

  2. 2.

    If ΣN(zN)0\Sigma_{N}(z_{N})\succ 0, we have

    fN+1𝒵2fN𝒵2=ΣN(zN)1/2(yNfN(zN))𝒴2.\displaystyle\begin{aligned} &\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\\ &\qquad=\left\|\Sigma_{N}(z_{N})^{-1/2}\left(y_{N}-f_{N}\left(z_{N}\right)\right)\right\|_{{\cal Y}}^{2}.\end{aligned} (19)

Lemma 5 characterizes the error incurred when we use y^N=fN(zN)\widehat{y}_{N}=f_{N}(z_{N}) as a predictor of yN=fN+1(zN)y_{N}=f_{N+1}(z_{N}), where fNf_{N} is the solution to the minimum norm interpolation problem on {(zi,yi)}i=0N1\{(z_{i},y_{i})\}^{N-1}_{i=0}. In particular, the prediction is exact when ΣN(zN)=0\Sigma_{N}(z_{N})=0. If ΣN(zN)\Sigma_{N}(z_{N}) is nonzero but still positive definite, the identity (19) expresses the squared norm of the weighted prediction error ΣN(zN)1/2(yNy^N)\Sigma_{N}(z_{N})^{-1/2}(y_{N}-\widehat{y}_{N}) in terms of the norms of fN+1f_{N+1} and fNf_{N}.

In a scalar-valued RKHS, ΣN(z)\Sigma_{N}(z) is equal to

dist2(κ𝒵(,z),span(κ𝒵(,z0),,κ𝒵(,zN1)))=minhspan(κ𝒵(,z0),,κ𝒵(,zN1))κ𝒵(,z)h𝒵2,\displaystyle\begin{split}&{\rm dist}^{2}(\kappa_{\cal Z}(\cdot,z),{\rm span}(\kappa_{\cal Z}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}(\cdot,z_{N-1})))\\ &=\min_{h\in{\rm span}(\kappa_{\cal Z}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}(\cdot,z_{N-1}))}\|\kappa_{\cal Z}(\cdot,z)-h\|_{{\cal H}_{\cal Z}}^{2},\end{split} (20)

which is the squared distance from κ𝒵(,z)\kappa_{\cal Z}(\cdot,z) to the linear span of kernel functions centered at the observed samples [11]. While Lemma 5 applies to any vector-valued RKHS of 𝒴{\cal Y}-valued functions on 𝒵{\cal H}_{\cal Z}, a natural choice of the operator-valued kernel could be κ𝒵(z1,z2)=κ𝒵s(z1,z2)I𝒴\kappa_{\cal Z}(z_{1},z_{2})=\kappa_{\cal Z}^{\rm{s}}(z_{1},z_{2})I_{{\cal Y}}, where κ𝒵s\kappa_{\cal Z}^{\rm{s}} is a scalar-valued kernel function. Let KNs=[κ𝒵s(zi,zj)]i,j=0N1N×N{K}_{N}^{\rm{s}}=[\kappa_{\cal Z}^{s}(z_{i},z_{j})]^{N-1}_{i,j=0}\in\mathbb{R}^{N\times N} be the Gram matrix defined by the scalar-valued kernel κ𝒵s\kappa_{\cal Z}^{\rm{s}} on the points z0,,zN1z_{0},\dots,z_{N-1}, let kNs(z)1×Nk^{\rm{s}}_{N}(z)\in\mathbb{R}^{1\times N} be the row vector with [kNs(z)]j=κ𝒵s(z,zj)[k_{N}^{\rm{s}}(z)]_{j}=\kappa_{\cal Z}^{\rm{s}}(z,z_{j}), KN=KNsI𝒴{K}_{N}={K}^{\rm{s}}_{N}\otimes I_{{\cal Y}}, and KN(zN)=kN1s(z)I𝒴K_{N}(z_{N})=k^{\rm{s}}_{N-1}(z)\otimes I_{{\cal Y}}. Finally, define

sN:=dist{span(κ𝒵s(,z0),,κ𝒵s(,zN1)),κ𝒵s(,zN)},\displaystyle s_{N}:={\rm dist}\left\{{\rm span}\left(\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{N-1})\right),\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{N})\right\},

where the distance is computed in the scalar-valued RKHS induced by κ𝒵s\kappa_{\cal Z}^{\rm{s}}, cf. (20). Then, using the properties of the Kronecker product \otimes, we can compute ΣN(zN)\Sigma_{N}(z_{N}) as follows:

ΣN(zN)\displaystyle\Sigma_{N}(z_{N})
=κ𝒵(zN,zN)KN(zN)KNKN(zN)\displaystyle=\kappa_{\cal Z}(z_{N},z_{N})-K_{N}(z_{N})K_{N}^{\dagger}K_{N}(z_{N})^{*}
=κ𝒵s(zN,zN)I𝒴\displaystyle=\kappa_{\cal Z}^{\rm{s}}(z_{N},z_{N})I_{{\cal Y}}
(kN1s(zN)I𝒴)(KN1sI𝒴)(kN1s(zN)I𝒴)\displaystyle\,\,-\left(k^{\rm{s}}_{N-1}(z_{N})\otimes I_{{\cal Y}}\right)^{*}\left({K}^{\rm{s}}_{N-1}\otimes I_{{\cal Y}}\right)^{\dagger}\left(k^{\rm{s}}_{N-1}(z_{N})\otimes I_{{\cal Y}}\right)
=(κ𝒵s(zN,zN)kN1s(zN)T(KN1s)kN1s(zN))I𝒴\displaystyle=\left(\kappa_{\cal Z}^{\rm{s}}(z_{N},z_{N})-k^{\rm{s}}_{N-1}(z_{N})^{\hbox{\it\tiny T}}({{K}^{\rm{s}}_{N-1}})^{\dagger}k^{\rm{s}}_{N-1}(z_{N})\right)I_{{\cal Y}}
=sN2I𝒴.\displaystyle=s_{N}^{2}I_{{\cal Y}}.

Hence, the equality in Lemma 5 reduces to

sN2(fN+1𝒵2fN𝒵2)=yNfN(zN)𝒴2.\displaystyle\begin{aligned} s_{N}^{2}\left(\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\right)=\left\|y_{N}-f_{N}\left(z_{N}\right)\right\|_{{\cal Y}}^{2}.\end{aligned}

This indicates that, given a new (zN,yN)(z_{N},y_{N}), if sN=0s_{N}=0, we have yNfN(zN)22=0\left\|y_{N}-f_{N}(z_{N})\right\|_{2}^{2}=0. Given Lemmas 4 and 5, the following is immediate:

Theorem 6 (Behavior Representer Theorem).

Let (ztd,yt+d)t=0T1\left(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}}\right)_{t=0}^{T-1} be a length-TT trajectory of regression and output vectors for an unknown f𝒵f_{\star}\in{\cal H}_{\cal Z}, i.e.,

yt+d=f(ztd)for each t=0,,T1.\displaystyle y^{\rm{d}}_{t^{+}}=f_{\star}(z^{\rm{d}}_{t})\qquad\text{for each }t=0,\dots,T-1.

Let [u0:LT,y0:LT]T[u_{0:L}^{\hbox{\it\tiny T}},y_{0:L}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}} be an element of (f)|L{\cal B}(f_{\star})|_{L}, and let (z0,y0+)(z_{0},y_{0^{+}}) be computed according to (7). Then the following holds:

  1. 1.

    If ΣT(z0)=0\Sigma_{T}(z_{0})=0, then y0+=kT(z0)KTYTy_{0^{+}}=k_{T}(z_{0})K_{T}^{\dagger}Y_{T}, where KTK_{T} and YTY_{T} are computed from the data according to (17) and YT:=[(y0+d)T,,(y(T1)+d)T]TY_{T}:=[(y^{\rm{d}}_{0^{+}})^{\hbox{\it\tiny T}},\cdots,(y^{\rm{d}}_{(T-1)^{+}})^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}.

  2. 2.

    If ΣT(z0)0\Sigma_{T}(z_{0})\succ 0, then

    ΣT(z0)1/2(y0+kT(z0)KTYT)𝒴2=fT+1𝒵2fT𝒵2f𝒵2fT𝒵2,\displaystyle\begin{split}&\|\Sigma_{T}(z_{0})^{-1/2}(y_{0^{+}}-k_{T}(z_{0})K_{T}^{\dagger}Y_{T})\|^{2}_{\cal Y}\\ &\qquad\qquad=\|f_{T+1}\|^{2}_{{\cal H}_{\cal Z}}-\|f_{T}\|^{2}_{{\cal H}_{\cal Z}}\\ &\qquad\qquad\leq\|f_{\star}\|^{2}_{{\cal H}_{\cal Z}}-\|f_{T}\|^{2}_{{\cal H}_{\cal Z}},\end{split} (21)

    where fTf_{T} (respectively, fT+1)f_{T+1}) is the minimum-norm interpolator of {(ztd,yt+d)}t=0T1\{(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})\}^{T-1}_{t=0} (respectively, of {(ztd,yt+d)}t=0T1{(z0,y0+)}\{(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})\}^{T-1}_{t=0}\cup\{(z_{0},y_{0^{+}})\}.

The condition ΣT(z0)=0\Sigma_{T}(z_{0})=0 describes the setting when exact reconstruction is possible. By Lemma 4, this will be the case for all pairs (z,y+=f(z))(z,y_{+}=f_{\star}(z)) for which range(κ𝒵(,z))range(ΦT)}{\rm range}(\kappa_{\cal Z}(\cdot,z))\subseteq{\rm range}(\Phi_{T})\}. If z0z_{0} does not satisfy this condition but ΣT(z0)0\Sigma_{T}(z_{0})\succ 0, we can quantify the reconstruction error using (21). For LTI systems, a sufficient condition for the existence of unique minimum-norm interpolating solutions is

t=0T1ztd(ztd)T0,\displaystyle\sum^{T-1}_{t=0}z^{\rm{d}}_{t}(z^{\rm{d}}_{t})^{\hbox{\it\tiny T}}\succ 0,

which is the classical persistence of excitation condition [17, 9]. Exact reconstruction is possible when z0span(z0d,,zT1d)z_{0}\in{\rm span}(z^{\rm{d}}_{0},\dots,z^{\rm{d}}_{T-1}). In the nonlinear setting, the condition ΣT(z0)=0\Sigma_{T}(z_{0})=0 plays an analogous role via Lemma 4.

It is useful to compare Theorem 6 with the fundamental lemma for LTI systems (cf. Section II). The latter characterizes behaviors through the image of the Hankel matrix built from observed input-output trajectories. Theorem 6 of this section is conceptually analogous, but is phrased in terms of a different data representation based on regression vectors, which are formed from inputs and past outputs.

V Subspace Identification in the Vector-Valued RKHS Setting

We now revisit systems that arise from state-space representations, as in Section III-C3. We will focus on a particular class of such systems, namely ones that can be represented as

xt+1=Axt+Bϕ(ut),yt=Cxt+Dϕ(ut),\displaystyle\begin{aligned} x_{t+1}&=Ax_{t}+B\phi\left(u_{t}\right),\\ y_{t}&=Cx_{t}+D\phi\left(u_{t}\right),\end{aligned} (22)

where ϕ:𝒰q\phi:{\cal U}\to\mathbb{R}^{q} is a mapping from the input space 𝒰=m{\cal U}=\mathbb{R}^{m} into q\mathbb{R}^{q}.

Given (A,B,C,D)(A,B,C,D), we construct the LL-step controllability matrix 𝒞L{\cal C}_{L}, the LL-step observability matrix 𝒪L{\cal O}_{L}, the reversed LL-step controllability matrix ΔL\Delta_{L}, and the LL-step modified Toeplitz matrix 𝒯~L\widetilde{{\cal T}}_{L} exactly as in (11). The LL-step Toeplitz matrix is given by

𝒯L:=𝒯~L+ILD.\displaystyle{\cal T}_{L}:=\widetilde{{\cal T}}_{L}+I_{L}\otimes D.

Let {(utd,ytd)}t=0T1\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1} denote input/output data of length TT collected from measurements of (22) starting from some initial condition x0nx_{0}\in\mathbb{R}^{n}. In the LTI case (i.e., when q=mq=m and ϕ\phi is the identity map), subspace identification methods [21] allow one to reconstruct, under certain regularity conditions, the state trajectory xL,,xTLx_{L},\dots,x_{T-L} and the observability matrix 𝒪L{\cal O}_{L} directly from the input/output data without knowledge of the system matrices A,B,C,DA,B,C,D.

In this section, we show that these methods can be extended to the set-up of (22) when ϕ\phi is an element of a suitable vector-valued RKHS on 𝒰{\cal U}. Moreover, we obtain a result in the spirit of the fundamental lemma for LTI systems, namely that the set of all valid length-LL input/output trajectories of (22) can be reconstructed directly from the data {(utd,ytd)}t=0T1\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1} without an intermediate system identification step.

V-A The construction of a vector-valued RKHS

There are various ways of instantiating the state-space model (22) in the vector-valued RKHS framework presented in Section III-C, i.e., choosing a suitable q\mathbb{R}^{q}-valued RKHS κ{\cal H}_{\kappa} on the input space 𝒰{\cal U} with reproducing kernel κ\kappa so that ϕκ\phi\in{\cal H}_{\kappa}. Perhaps the simplest one is to define the operator-valued map κ:𝒰×𝒰(q)\kappa:{\cal U}\times{\cal U}\to{\cal L}(\mathbb{R}^{q}) by

κ(u,u):=ϕ(u)ϕ(u),\displaystyle\kappa(u,u^{\prime}):=\phi(u)\otimes\phi(u^{\prime}),

which is readily seen to be of the positive type since, for any nn, any α1,,αn\alpha_{1},\dots,\alpha_{n}\in\mathbb{R}, any u1,,un𝒰u_{1},\dots,u_{n}\in{\cal U}, and any vqv\in\mathbb{R}^{q},

i,j=1nαiαjκ(ui,uj)v,vq=i,j=1nαiαjϕ(ui),vqϕ(uj),vq=(i=1nαiϕ(ui),vq)20.\displaystyle\begin{aligned} &\sum^{n}_{i,j=1}\alpha_{i}\alpha_{j}\langle\kappa(u_{i},u_{j})v,v\rangle_{\mathbb{R}^{q}}\\ &=\sum^{n}_{i,j=1}\alpha_{i}\alpha_{j}\langle\phi(u_{i}),v\rangle_{\mathbb{R}^{q}}\langle\phi(u_{j}),v\rangle_{\mathbb{R}^{q}}\\ &=\left(\sum^{n}_{i=1}\alpha_{i}\langle\phi(u_{i}),v\rangle_{\mathbb{R}^{q}}\right)^{2}\\ &\geq 0.\end{aligned}

By [5, Prop. 2.3], there is a unique q\mathbb{R}^{q}-valued RKHS κ{\cal H}_{\kappa} of functions on 𝒰{\cal U} with reproducing kernel κ\kappa. The kernel κ\kappa has the convenient property

ϕ(u),ϕ(u)q=Tr(κ(u,u)).\displaystyle\left\langle\phi(u),\phi(u^{\prime})\right\rangle_{\mathbb{R}^{q}}={\textrm{Tr}}\left(\kappa(u,{u^{\prime}})\right). (23)

V-B Subspace identification using RKHS methods

We now have all the ingredients in place for putting together a subspace identification framework analogous to the one for LTI systems [21].

Let the input/output data {(utd,ytd)}t=0T1\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1} be given. Introduce the Hankel-type block matrices

[YpYf]:=[y0dyT2LdyL1dyTL1dyLdyTLdy2L1dyT1d].\displaystyle\begin{aligned} \begin{bmatrix}{Y_{{\rm{p}}}}\\ \hline\cr{Y_{{\rm{f}}}}\end{bmatrix}:=\begin{bmatrix}y^{\rm{d}}_{0}&\cdots&y^{\rm{d}}_{T-2L}\\ \vdots&&\vdots\\ y^{\rm{d}}_{L-1}&\cdots&y^{\rm{d}}_{T-L-1}\\ \hline\cr y^{\rm{d}}_{L}&\cdots&y^{\rm{d}}_{T-L}\\ \vdots&&\vdots\\ y^{\rm{d}}_{2L-1}&\cdots&y^{\rm{d}}_{T-1}\end{bmatrix}.\end{aligned} (24)

and

[UpϕUfϕ]:=[ϕ(u0d)ϕ(uT2Ld)ϕ(uL1d)ϕ(uTL1d)ϕ(uLd)ϕ(uTLd)ϕ(u2L1d)ϕ(uT1d)],\displaystyle\begin{aligned} \begin{bmatrix}U_{{\rm{p}}}^{\phi}\\ \hline\cr U_{{\rm{f}}}^{\phi}\end{bmatrix}:=\begin{bmatrix}\phi\left(u^{\rm{d}}_{0}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-2L}\right)\\ \vdots&&\vdots&\\ \phi\left(u^{\rm{d}}_{L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-L-1}\right)\\ \hline\cr\phi\left(u^{\rm{d}}_{L}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-L}\right)\\ \vdots&&\vdots&\\ \phi\left(u^{\rm{d}}_{2L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-1}\right)\end{bmatrix},\end{aligned} (25)

where p{\rm{p}} and f{\rm{f}} designates a partition into “past” and “future” data. Let HpH_{{\rm{p}}} and HfH_{{\rm{f}}} denote, respectively, the concatenations of UpU_{{\rm{p}}} and YpY_{{\rm{p}}} and of UfU_{{\rm{f}}} and YfY_{{\rm{f}}}:

Hp:=[UpYp],Hf:=[UfYf].\displaystyle H_{{\rm{p}}}:=\begin{bmatrix}U_{{\rm{p}}}\\ Y_{{\rm{p}}}\end{bmatrix},\quad H_{{\rm{f}}}:=\begin{bmatrix}U_{{\rm{f}}}\\ Y_{{\rm{f}}}\end{bmatrix}.

Similar to the linear case, let Π\Pi denote the oblique projection of the rowspace of YfY_{{\rm{f}}} onto the rowspace of HpH_{{\rm{p}}} along the rowspace of UfU_{{\rm{f}}}:

Π:=Yf/UfϕHp,\displaystyle\Pi:={Y_{{\rm{f}}}}\underset{U_{{\rm{f}}}^{\phi}}{/}H_{p},

which satisfies rank(Π)T2L+1{\rm rank}(\Pi)\leq T-2L+1. Let the SVD of Π\Pi be given by

Π=[U1U2][Σ1000][V1TV2T]=U1Σ1V1T.\displaystyle\Pi=\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}\begin{bmatrix}\Sigma_{1}&0\\ 0&0\end{bmatrix}\begin{bmatrix}V_{1}^{{\hbox{\it\tiny T}}}\\ V_{2}^{{\hbox{\it\tiny T}}}\end{bmatrix}=U_{1}\Sigma_{1}V_{1}^{{\hbox{\it\tiny T}}}.

Following [21], let Xf:=[xLxL+1xTL]X_{\rm{f}}:=\begin{bmatrix}x_{L}&x_{L+1}&\dots&x_{T-L}\end{bmatrix}, where xtx_{t} is the state trajectory of (22) determined by the initial state x0x_{0} and the inputs utdu^{\rm{d}}_{t}. In the LTI case, XfX_{\rm{f}} can be recovered, up to a similarity transformation, via the formula Xf=Σ11/2V1TX_{{\rm{f}}}=\Sigma_{1}^{1/2}V_{1}^{\hbox{\it\tiny T}} (see, e.g., [21, Theorem 2, Chapter 2]). In fact, the same result holds in the present nonlinear case (the idea is to view vk:=ϕ(uk)v_{k}:=\phi(u_{k}) as an input to an LTI system given by (A,B,C,D)(A,B,C,D)). Given the input/output data, let

Uϕ:=[ϕ(u0d)ϕ(uT2Lnd)ϕ(u2L1d)ϕ(uTnd)ϕ(u2Ld)ϕ(uTnd)ϕ(u2L1+nd)ϕ(uT1d)],\displaystyle U^{\phi}:=\begin{bmatrix}\phi\left(u^{\rm{d}}_{0}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-2L-n}\right)\\ \vdots&&\vdots\\ \phi\left(u^{\rm{d}}_{2L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-n}\right)\\ \hline\cr\phi\left(u^{\rm{d}}_{2L}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-n}\right)\\ \vdots&&\vdots\\ \phi\left(u^{\rm{d}}_{2L-1+n}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-1}\right)\end{bmatrix}, (26)

and form the input Gram matrix of depth 2L+n2L+n as Ku:=(Uϕ)TUϕK^{u}:=(U^{\phi})^{{\hbox{\it\tiny T}}}U^{\phi} with entries [Ku]ij=k=02L+n1κ(ui+kd,uj+kd)[K^{u}]_{ij}=\sum_{k=0}^{2L+n-1}\kappa\left(u^{\rm{d}}_{i+k},u^{\rm{d}}_{j+k}\right). The following theorem is a straightforward adaptation of subspace identification results for LTI systems:

Theorem 7.

Let {uid,yid}i=0T1\left\{u^{\rm{d}}_{i},y^{\rm{d}}_{i}\right\}_{i=0}^{T-1} be a sequence of length TT generated by the system (22), where (A,B)(A,B) is controllable and (A,C)(A,C) is observable. Suppose that the input sequence is such that rank(Ku)=(2L+n)q{\rm rank}(K^{u})=(2L+n)q. Then, Π=𝒪LXf\Pi={\cal O}_{L}X_{{\rm{f}}} and Xf=Σ11/2V1TX_{{\rm{f}}}=\Sigma^{1/2}_{1}V_{1}^{\hbox{\it\tiny T}}.

The process of computing the oblique projection and constructing the state vector can be carried out using Gram matrices computed from pairwise kernel evaluations κ\kappa, without explicitly using ϕ\phi. Specifically, define the Gram matrices Kpu=(Upϕ)TUpϕK_{\rm{p}}^{u}=\left({U_{{\rm{p}}}^{\phi}}\right)^{{\hbox{\it\tiny T}}}{U_{{\rm{p}}}^{\phi}}, Kfu=(Ufϕ)TUfϕK_{\rm{f}}^{u}=\left({U_{{\rm{f}}}^{\phi}}\right)^{{\hbox{\it\tiny T}}}{U_{{\rm{f}}}^{\phi}}, and Kpy=YpTYpK_{\rm{p}}^{y}={Y_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}{Y_{{\rm{p}}}}, whose entries are

=ijk=0L1ϕ(ui+kd),ϕ(uj+kd)M=(a)k=0L1Tr(κ(ui+kd,uj+kd)),[Kfu]ij=k=0L1ϕ(ui+L+kd),ϕ(uj+L+kd)M=(b)k=0L1Tr(κ(ui+L+kd,uj+L+kd)),[Kpy]ij=k=0L1yi+kdTyj+kd,\displaystyle\begin{aligned} {}_{ij}=&\sum_{k=0}^{L-1}\left\langle\phi(u^{\rm{d}}_{i+k}),\phi(u^{\rm{d}}_{j+k})\right\rangle_{\mathbb{R}^{M}}\\ \stackrel{{\scriptstyle(a)}}{{=}}&\sum_{k=0}^{L-1}{\textrm{Tr}}\left(\kappa\left(u^{\rm{d}}_{i+k},u^{\rm{d}}_{j+k}\right)\right),\\ [K_{\rm{f}}^{u}]_{ij}=&\sum_{k=0}^{L-1}\left\langle\phi(u^{\rm{d}}_{i+L+k}),\phi(u^{\rm{d}}_{j+L+k})\right\rangle_{\mathbb{R}^{M}}\\ \stackrel{{\scriptstyle(b)}}{{=}}&\sum_{k=0}^{L-1}{\textrm{Tr}}\left(\kappa\left(u^{\rm{d}}_{i+L+k},u^{\rm{d}}_{j+L+k}\right)\right),\\ [K_{\rm{p}}^{y}]_{ij}=&\sum_{k=0}^{L-1}{y^{\rm{d}}_{i+k}}^{\hbox{\it\tiny T}}y^{\rm{d}}_{j+k},\end{aligned}

respectively, where (a) and (b) follow from (23).

The oblique projection can be computed as

Π=Yf/UfϕHp=Yf[HpT(Ufϕ)T]([HpUfϕ][HpT(Ufϕ)T])[Hp0]=Yf([HpT(Ufϕ)T][HpUfϕ])[HpT(Ufϕ)T][Hp0]=Yf(Kpu+Kpy+Kfu)(Kpu+Kpy),\displaystyle\begin{aligned} \Pi=&{Y_{{\rm{f}}}}\underset{{U_{{\rm{f}}}^{\phi}}}{/}H_{p}\\ =&{Y_{{\rm{f}}}}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\left(\begin{bmatrix}{H_{{\rm{p}}}}\\ {{U_{{\rm{f}}}^{\phi}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\right)^{\dagger}\begin{bmatrix}{H_{{\rm{p}}}}\\ {0}\end{bmatrix}\\ =&{Y_{{\rm{f}}}}\left(\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}\\ {{U_{{\rm{f}}}^{\phi}}}\end{bmatrix}\right)^{\dagger}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}\\ {0}\end{bmatrix}\\ =&{Y_{{\rm{f}}}}\left(K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}+K_{{\rm{f}}}^{u}\right)^{\dagger}\left(K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}\right),\\ \end{aligned} (27)

where the second-to-last line follows from the identity x(xTx)=(xxT)xx(x^{{\hbox{\it\tiny T}}}x)^{\dagger}=(xx^{{\hbox{\it\tiny T}}})^{\dagger}x.

To construct the state vector, instead of directly performing SVD on Π\Pi, we construct UU, Σ\Sigma, and VV such that Π=UΣVT\Pi=U\Sigma V^{{\hbox{\it\tiny T}}} via eigendecomposition of ΠTΠ\Pi^{{\hbox{\it\tiny T}}}\Pi and ΠΠT\Pi\Pi^{{\hbox{\it\tiny T}}}. Denote K¯p,f:=Kpu+Kpy+Kfu\overline{K}_{{\rm{p}},{\rm{f}}}:=K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}+K_{{\rm{f}}}^{u} and K¯p:=Kpu+Kpy\overline{K}_{{\rm{p}}}:=K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y} and notice that we have

ΠTΠ=K¯pT(K¯p,f)T(YfTYf)K¯p,fK¯p=K¯pT(K¯p,f)TKfyK¯p,fK¯p=K¯pTK¯p,fKfyK¯p,fK¯p.\displaystyle\begin{aligned} \Pi^{{\hbox{\it\tiny T}}}\Pi=&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left(\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\right)^{{\hbox{\it\tiny T}}}\left({{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}{{Y_{{\rm{f}}}}}\right)\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\\ =&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left(\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\\ =&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}K_{{\rm{f}}}^{y}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}.\end{aligned}

Then, we can obtain (σi1/2,vi)\left(\sigma_{i}^{1/2},v_{i}\right) as the eigenvalues and right eigenvectors of ΠTΠ\Pi^{{\hbox{\it\tiny T}}}\Pi. Likewise, we have

ΠΠT=YfK¯p,fK¯pK¯pTK¯p,fYfT.\displaystyle\begin{aligned} \Pi\Pi^{{\hbox{\it\tiny T}}}=&{Y_{{\rm{f}}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}.\end{aligned}

Denote Γ:=K¯p,fK¯pK¯pTK¯p,f\Gamma:=\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}. Then, we have

YfΓYfT(Yfzi)=σi1/2(Yfzi)ΓYfTYf=:Kfyξi=σi1/2ξi.\displaystyle{Y_{{\rm{f}}}}\Gamma{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}\left({Y_{{\rm{f}}}}z_{i}\right)=\sigma^{1/2}_{i}\left({Y_{{\rm{f}}}}z_{i}\right)\Leftrightarrow\Gamma\underbrace{{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}{Y_{{\rm{f}}}}}_{=:K_{{\rm{f}}}^{y}}\xi_{i}=\sigma_{i}^{1/2}\xi_{i}.

That is, ΠΠT=YfΓYfT\Pi\Pi^{{\hbox{\it\tiny T}}}={{Y_{{\rm{f}}}}}\Gamma{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}} has an eigenvector ui=Yfξiu_{i}={Y_{{\rm{f}}}}\xi_{i} associated with eigenvalue σi1/2\sigma^{1/2}_{i} iff ξi\xi_{i} is an eigenvector of ΓKfy\Gamma K_{{\rm{f}}}^{y} corresponding to the same eigenvalue. With U=[u1,,un]U=[u_{1},\cdots,u_{n}], Σ=diag(σ1,,σn)\Sigma=\text{diag}(\sigma_{1},\cdots,\sigma_{n}), and V=[v1,,vn]V=[v_{1},\cdots,v_{n}], we can define the extended observability matrix 𝒪L{\cal O}_{L} and states XfX_{f} (up to a similarity transform) as

𝒪L=UΣ1/2,Xf=VΣ1/2.\displaystyle{\cal O}_{L}=U\Sigma^{1/2},\quad X_{{\rm{f}}}=V\Sigma^{1/2}.

The following corollary is immediate from the above process.

Corollary 8 (Construction of state vectors).

Suppose that (A,B)(A,B) is controllable and (A,C)(A,C) is observable, i.e., rank(𝒪L)=rank(𝒞L)=n{\rm rank}({\cal O}_{L})={\rm rank}({\cal C}_{L})=n. Let input-output data {(utd,ytd)}t=0T1\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t})\}_{t=0}^{T-1} of length TT be given. Suppose the input sequence is such that rank(K2L+nu)=(2L+n)q{\rm rank}(K_{2L+n}^{u})=(2L+n)q. Let (Σ1/2,V)(\Sigma^{1/2},V) be the eigenpair of K¯pT(K¯p,f1)TKfyK¯p,f1K¯p\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left({\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}{\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\overline{K}_{{\rm{p}}}, and (Σ1/2,Ξ)(\Sigma^{1/2},\Xi) be the eigenpair of K¯p,f1K¯pK¯pT(K¯p,f1)TKfy{\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left({\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}. Define U=YfΞU={Y_{{\rm{f}}}}\Xi. Then the extended observability matrix is 𝒪L=UΣ1/2{{\cal O}}_{L}=U\Sigma^{1/2}, and the state sequence is Xf=VΣ1/2X_{f}=V\Sigma^{1/2} (both up to a similarity transformation).

The next result explicitly connects subspace identification to an RKHS version of the fundamental lemma for nonlinear systems admitting state-space realization (22).

Theorem 9.

Consider the state-space model (22), where (A,B)(A,B) is controllable and (A,C)(A,C) is observable. Let input-output data {(utd,ytd)}t=0T1\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t})\}_{t=0}^{T-1} of length TT be given, and suppose the input Gram matrix is such that rank(K2L+nu)=(2L+n)q{\rm rank}(K_{2L+n}^{u})=(2L+n)q and that LL is chosen so that 𝒪L{\cal O}_{L} has full column rank. Then a length-2L2L sequence [u0:2L1y0:2L1]\begin{bmatrix}u_{0:2L-1}\\ y_{0:2L-1}\end{bmatrix} is a valid input/output trajectory of the system (22) if and only if there exists ξT2L+1\xi\in\mathbb{R}^{T-2L+1} such that

[kpukpy]=[KpuKpy]ξ and [kfukfy]=[KfuKfy]ξ,\displaystyle\begin{bmatrix}k_{{\rm{p}}}^{u}\\ k_{{\rm{p}}}^{y}\end{bmatrix}=\begin{bmatrix}K_{{\rm{p}}}^{u}\\ K_{{\rm{p}}}^{y}\end{bmatrix}\xi\text{ and }\begin{bmatrix}k_{{\rm{f}}}^{u}\\ k_{{\rm{f}}}^{y}\end{bmatrix}=\begin{bmatrix}K_{{\rm{f}}}^{u}\\ K_{{\rm{f}}}^{y}\end{bmatrix}\xi,

where, for Φ0:2L1:=[ϕ(u0)T,,ϕ(u2L1)T]T\Phi_{0:2L-1}:=[\phi(u_{0})^{\hbox{\it\tiny T}},\dots,\phi(u_{2L-1})^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}, the vectors kpu=(Upϕ)TΦ0:2L1k_{{\rm{p}}}^{u}=(U_{{\rm{p}}}^{\phi})^{{\hbox{\it\tiny T}}}\Phi_{0:2L-1} and kfu=(Ufϕ)TΦ0:2L1k_{{\rm{f}}}^{u}=(U_{{\rm{f}}}^{\phi})^{{\hbox{\it\tiny T}}}\Phi_{0:2L-1} have entries given by [kpu]j=r=0L1κ(uj+rd,ur)[k_{{\rm{p}}}^{u}]_{j}=\sum_{r=0}^{L-1}\kappa(u^{\rm{d}}_{j+r},u_{r}), [kfu]j=r=0L1κ(uj+L+rd,uL+r)[k_{{\rm{f}}}^{u}]_{j}=\sum_{r=0}^{L-1}\kappa(u^{\rm{d}}_{j+L+r},u_{L+r}); and kpy=YpTy0:L1k_{{\rm{p}}}^{y}=Y_{\rm{p}}^{{\hbox{\it\tiny T}}}{y}_{0:L-1}, kfy=YfTyL:2L1k_{{\rm{f}}}^{y}=Y_{\rm{f}}^{{\hbox{\it\tiny T}}}{y}_{L:2L-1} are the Euclidean inner products of outputs.

In the context of the above theorem, we can view (Kpu,Kpy,Kfu,Kfy)(K_{{\rm{p}}}^{u},K_{{\rm{p}}}^{y},K_{{\rm{f}}}^{u},K_{{\rm{f}}}^{y}) as kernel matrices of the “offline” training data generated by (22). By splitting the kernel vector of the “online” testing sequence Φ0:2L1\Phi_{0:2L-1} into two parts (past and future), we test whether one can interpolate the kernel matrices of past and future using the same coefficients, where the initial length-LL segment of the testing sequence specifies the initial condition for the subsequent length-LL segment. A similar observation for LTI systems on the specification of initial condition has been given in [13, Proposition 1]. In the proof of Theorem 9, we further show that the trajectory passes through a state x^L\widehat{x}_{L} at time LL that is a linear combination of states from the training data, i.e., x^L=Xfξ\widehat{x}_{L}=X_{\rm{f}}\xi for some ξ\xi.

In light of Theorem 9, we can compute the predictor y^L:2L1\widehat{y}_{L:2L-1} via

y^L:2L1=(YfT)kfy=(YfT)Kfyξ,\displaystyle\widehat{y}_{L:2L-1}=(Y_{\rm{f}}^{{\hbox{\it\tiny T}}})^{\dagger}k_{{\rm{f}}}^{y}=(Y_{\rm{f}}^{{\hbox{\it\tiny T}}})^{\dagger}{K_{{\rm{f}}}^{y}}\xi,

where ξ\xi is the solution of the following equation,

[kpukpykfu]=[KpuKpyKfu]ξ.\displaystyle\begin{bmatrix}k_{{\rm{p}}}^{u}\\ k_{{\rm{p}}}^{y}\\ k_{{\rm{f}}}^{u}\end{bmatrix}=\begin{bmatrix}{K_{{\rm{p}}}^{u}}\\ {K_{{\rm{p}}}^{y}}\\ {K_{{\rm{f}}}^{u}}\\ \end{bmatrix}\xi.

The interpolation-based method in Section IV characterizes the future output Yf{Y_{{\rm{f}}}} using kernelized offline data to make predictions regarding the online data. On the other hand, the realization-based method aims to capture the output yt+L1y_{t+L-1} as linear functions of states XfX_{{\rm{f}}} and kernelized input vectors ϕ(ut),,ϕ(ut+L1)\phi(u_{t}),\dots,\phi(u_{t+L-1}). The preference between the two approaches depends on the memory length LL and the state dimension nn. E.g., in the regime LnL\gg n, the states serve as an efficient representation of the system’s history.

VI Conclusion

In this paper, we have put forward a behavioral framework for modeling a class of nonlinear systems in a vector-valued RKHS. This formulation is rich enough to cover LTI systems as well as nonlinear systems modeled by Volterra series, autoregressive models based on Volterra series, and Hammerstein-type state-space models. Using this framework, we have analyzed two methods for data-driven modeling of such systems, minimum-norm interpolation and subspace identification. We have clarified the role of various structural assumptions on the system, the data (both offline and online), and the vector-valued RKHS that represents the nonlinear aspects of the system. More broadly, this work expands the scope of behavioral systems theory to nonlinear systems. In doing so, it reinforces the conceptual shift underlying data-driven control: rather than aiming to recover the state evolution, one can directly actualize the desired behaviors using observed trajectories.

Appendix A Proof of Lemma 2

Let 𝒱\|\cdot\|_{\cal V} and 𝒲\|\cdot\|_{\cal W} be the Hilbert-space norms on 𝒱{\cal V} and 𝒲{\cal W} induced by their respective inner products. Consider the evaluation functional δv:(𝒱,𝒲)𝒲\delta_{v}:{\cal L}({\cal V},{\cal W})\to{\cal W} given by δv(A)=Av\delta_{v}(A)=Av. It is bounded since

δv(A)𝒲=Av𝒲A(𝒱,𝒲)v𝒱Av𝒱,\displaystyle\begin{aligned} \left\|\delta_{v}(A)\right\|_{{\cal W}}=&\left\|Av\right\|_{{\cal W}}\\ \leq&\left\|A\right\|_{{\cal L}({\cal V},{\cal W})}\left\|v\right\|_{{\cal V}}\\ \leq&\left\|A\right\|_{{\cal H}}\left\|v\right\|_{{\cal V}},\end{aligned}

where the last step follows from the relation

A(𝒱,𝒲)\displaystyle\left\|A\right\|_{{\cal L}({\cal V},{\cal W})} =supv𝒱,v𝒱=1Av,Av𝒲\displaystyle=\sup_{v\in{\cal V},\,\left\|v\right\|_{\cal V}=1}\sqrt{\langle Av,Av\rangle_{\cal W}}
=supv𝒱,v𝒱=1v,AAv𝒱\displaystyle=\sup_{v\in{\cal V},\,\left\|v\right\|_{\cal V}=1}\sqrt{\langle v,A^{*}Av\rangle_{\cal V}}
A.\displaystyle\leq\|A\|_{\cal H}.

Hence, {\cal H} is an RKHS. We next show that κ(v,v)=v,v𝒱I𝒲\kappa\left(v,v^{\prime}\right)=\langle v,v^{\prime}\rangle_{{\cal V}}I_{{\cal W}} is the reproducing kernel. It is obvious that κ\kappa is of the positive type. To show it satisfies the reproducing property, notice that for any v𝒱v\in{\cal V} and w𝒲w\in{\cal W} we have

Av,w𝒲=Tr(A(wv))=A,wv,\displaystyle\begin{aligned} \left\langle Av,w\right\rangle_{{\cal W}}={\textrm{Tr}}\left(A^{*}(w\otimes v)\right)\\ =\left\langle A,w\otimes v\right\rangle_{{\cal H}},\end{aligned}

where wvw\otimes v\in{\cal H} is the rank-one linear operator (wv)v=v,v𝒱w(w\otimes v)v^{\prime}=\langle v,v^{\prime}\rangle_{\cal V}w.

On the other hand, for w𝒲w\in{\cal W}, we have

κ(v,v)w=v,v𝒱w=(wv)v.\displaystyle\begin{aligned} \kappa\left(v^{\prime},v\right)w=\langle v^{\prime},v\rangle_{\cal V}w=(w\otimes v^{\prime})v.\end{aligned}

Hence, we have

A,κ(,v)w=Av,w𝒲.\displaystyle\left\langle A,\kappa(\cdot,v)w\right\rangle_{{\cal H}}=\left\langle Av,w\right\rangle_{{\cal W}}.

That is, κ\kappa satisfies the reproducing property.

Appendix B Proofs For Results in Section IV

B-A Proof of Theorem 3

Let c0,,cT1c_{0},\cdots,c_{T-1}\in\mathbb{R} be given. For each j=0,,T1j=0,\cdots,{T-1}, the relation

yj+d,v𝒴=f,κ𝒵(,zjd)v𝒵\displaystyle\langle y_{j^{+}}^{\rm{d}},v\rangle_{{\cal Y}}=\langle f,\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})v\rangle_{{\cal H}_{\cal Z}}

holds for all v𝒴v\in{\cal Y}. Multiplying both sides by cjc_{j} and summing over j=0,,T1j=0,\cdots,{T-1}, we have

j=0T1cjyj+d,v𝒴=f,j=0T1cjκ𝒵(,zjd)v𝒵=j=0T1cjf(zjd),v𝒵,\displaystyle\begin{aligned} \left\langle\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}},v\right\rangle_{{\cal Y}}=&\left\langle f,\sum_{j=0}^{T-1}c_{j}\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})v\right\rangle_{{\cal H}_{\cal Z}}\\ =&\left\langle\sum_{j=0}^{T-1}c_{j}f(z_{j}^{\rm{d}}),v\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

for all v𝒴v\in{\cal Y}, where the last line follows from the reproducing property. Hence, the pair (j=0T1cjκZ(,zjd),j=0T1cjyj+d)\left(\sum_{j=0}^{T-1}c_{j}\kappa_{Z}(\cdot,z_{j}^{\rm{d}}),\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}}\right) satisfies Eq. (3) as claimed.

B-B Proof of Lemma 4

Define the following subspace N{\cal H}_{N} of 𝒵{\cal H}_{\cal Z}:

N:=span{κ𝒵(,zi)v,i=0,,N1,v𝒴}=range(ΦN).\displaystyle\begin{aligned} {\cal H}_{N}&:={\rm span}\left\{\kappa_{\cal Z}(\cdot,z_{i})v,\ i=0,\cdots,N-1,\ v\in{\cal Y}\right\}\\ &={{\rm range}(\Phi_{N})}.\end{aligned} (28)

Let ΠN\Pi_{N} denote the orthogonal projection onto N{\cal H}_{N} and notice that ΠN=ΦN(ΦNΦN)ΦN\Pi_{N}=\Phi_{N}\left(\Phi_{N}^{*}\Phi_{N}\right)^{\dagger}\Phi_{N}^{*}. Using this in the definition of ΣN(z)\Sigma_{N}(z) in (18), we get

ΣN(z)=κ𝒵(z,z)kN(z)KNkN(z)=κ𝒵(,z)κ𝒵(,z)κ𝒵(,z)ΦN(ΦNΦN)(κ𝒵(,z)ΦN)=κ𝒵(,z)(IΦN(ΦNΦN)ΦN)κ𝒵(,z)=κ𝒵(,z)(IΠN)κ𝒵(,z).\displaystyle\begin{aligned} \Sigma_{N}(z)&=\kappa_{\cal Z}(z,z)-k_{N}(z)K_{N}^{\dagger}{k}_{N}(z)^{*}\\ &=\kappa_{\cal Z}(\cdot,z)^{*}\kappa_{\cal Z}(\cdot,z)\\ &\quad-\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\left(\Phi_{N}^{*}\Phi_{N}\right)^{\dagger}\left(\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\right)^{*}\\ &=\kappa_{\cal Z}(\cdot,z)^{*}\big(I-\Phi_{N}(\Phi_{N}^{*}\Phi_{N})^{\dagger}\Phi_{N}^{*}\big)\kappa_{\cal Z}(\cdot,z)\\ &=\kappa_{\cal Z}(\cdot,z)^{*}(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z).\end{aligned}

Hence, for any vpv\in\mathbb{R}^{p},

vTΣN(z)v=vTκ𝒵(,z)(IΠN)κ𝒵(,z)v=(IΠN)κ𝒵(,z)v𝒵.\displaystyle\begin{aligned} v^{\hbox{\it\tiny T}}\Sigma_{N}(z)v=&v^{\hbox{\it\tiny T}}\kappa_{\cal Z}(\cdot,z)^{*}(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v\\ =&\left\|(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v\right\|_{{\cal H}_{\cal Z}}.\end{aligned}

Hence, ΣN(z)\Sigma_{N}(z) is positive semi-definite. In particular, ΣN(z)=0\Sigma_{N}(z)=0 iff (IΠN)κ𝒵(,z)v=0(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v=0 for all vpv\in\mathbb{R}^{p}, that is, if κ𝒵(,z)vN=range(ΦN)\kappa_{\cal Z}(\cdot,z)v\in{\cal H}_{N}={\rm range}(\Phi_{N}).

B-C Proof of Lemma 5

Since fN+1f_{N+1} interpolates {(zi,yi)}i=0N1\{(z_{i},y_{i})\}^{N-1}_{i=0}, we have for all v𝒴v\in{\cal Y} and i=0,,N1i=0,\cdots,N-1,

fN+1,κ𝒵(,zi)v𝒵=fN+1(zi),v𝒴=yi,v𝒴.\displaystyle\begin{aligned} \left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}=\left\langle f_{N+1}(z_{i}),v\right\rangle_{{\cal Y}}=\left\langle y_{i},v\right\rangle_{{\cal Y}}.\end{aligned}

On the other hand, for i=0,,N1i=0,\cdots,N-1 and all v𝒴v\in{\cal Y},

fN+1,κ𝒵(,zi)v𝒵=ΠN[fN+1]+ΠN[fN+1],κ𝒵(,zi)v𝒵=ΠN[fN+1],κ𝒵(,zi)v𝒵,\displaystyle\begin{aligned} &\left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}[f_{N+1}]+\Pi_{N}^{\perp}[f_{N+1}],\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}[f_{N+1}],\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

where ΠN=IΠN\Pi^{\perp}_{N}=I-\Pi_{N} and ΠN\Pi_{N} is the orthogonal projection onto N{\cal H}_{N}. Since vv is arbitrary, ΠN[fN+1](zi)=yi\Pi_{N}[f_{N+1}](z_{i})=y_{i} for i=0,,N1i=0,\cdots,N-1, i.e., both fNf_{N} and ΠN[fN+1]\Pi_{N}[f_{N+1}] interpolate {(zi,yi)}i=0N1\{(z_{i},y_{i})\}_{i=0}^{N-1}. Since fNf_{N} is the unique minimum-norm solution in N{\cal H}_{N}, it must be the case that fN=ΠN[fN+1]f_{N}=\Pi_{N}[f_{N+1}]. Hence,

fN+1fN=fN+1ΠN[fN+1]N.\displaystyle f_{N+1}-f_{N}=f_{N+1}-\Pi_{N}[f_{N+1}]\in{\cal H}_{N}^{\perp}.

That is, there exists some ξN𝒴\xi_{N}\in{\cal Y} such that

fN+1fN=ΠN[κ𝒵(,zN)ξN].\displaystyle\begin{aligned} f_{N+1}-f_{N}=\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}].\end{aligned} (29)

We now determine ξN\xi_{N}. By the reproducing property, we have

fN+1(zN),v𝒴=fN+1,κ𝒵(,zN)v𝒵=fN+ΠN[κ𝒵(,zN)ξN],κ𝒵(,zN)vZ=fN,κ𝒵(,zN)v𝒵+ΠN[κ𝒵(,zN)ξN],κ𝒵(,zN)v𝒵.\displaystyle\begin{aligned} &\left\langle f_{N+1}(z_{N}),v\right\rangle_{{\cal Y}}\\ &=\left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle f_{N}+\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{Z}}\\ &=\left\langle f_{N},\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &\quad+\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}.\end{aligned} (30)

For the last line, applying the reproducing property again, we can write the first term as fN,κ𝒵(,zN)v𝒵=fN(zN),v𝒴\langle f_{N},\kappa_{\cal Z}(\cdot,z_{N})v\rangle_{{\cal H}_{\cal Z}}=\langle f_{N}(z_{N}),v\rangle_{{\cal Y}}. For the second term, by orthogonality we have

ΠN[κ𝒵(,zN)ξN],κ𝒵(,zN)v𝒵=ΠN[κ𝒵(,zN)ξt],ΠN[κ𝒵(,zN)v]𝒵=ξN,κ𝒵(zN,zN)v𝒵ΠN[κ𝒵(,zN)ξN],ΠN[κ𝒵(,zN)v]𝒵,\displaystyle\begin{aligned} &\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{t}],\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\langle\xi_{N},\kappa_{\cal Z}(z_{N},z_{N})v\rangle_{{\cal H}_{\cal Z}}\\ &\quad-\left\langle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

where we have used the fact that, since ΠN\Pi_{N} is an orthogonal projection,

ΠNh,ΠNh𝒵=h,ΠNh𝒵=ΠNh,h𝒵\displaystyle\langle\Pi_{N}h,\Pi_{N}h^{\prime}\rangle_{{\cal H}_{\cal Z}}=\langle h,\Pi_{N}h^{\prime}\rangle_{{\cal H}_{\cal Z}}=\langle\Pi_{N}h,h^{\prime}\rangle_{{\cal H}_{\cal Z}}

for all h,h𝒵h,h^{\prime}\in{\cal H}_{\cal Z}. Since ΠN\Pi_{N} is an orthogonal projection onto N{\cal H}_{N}, we have

ΠN[κ𝒵(,zN)ξN]=j=0N1κ𝒵(,zj)αj,\displaystyle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}]=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\alpha_{j},

and

ΠN[κ𝒵(,zN)v]=j=0N1κ𝒵(,zj)βj,\displaystyle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\beta_{j},

where α¯=(α0,αN1)𝒴N1\bar{\alpha}=(\alpha_{0},\cdots\alpha_{N-1})\in{\cal Y}^{N-1} and β¯=(β0,,βN1)𝒴N1\bar{\beta}=(\beta_{0},\dots,\beta_{N-1})\in{\cal Y}^{N-1} are the solutions of KNα¯=kN(zN)ξNK_{N}\bar{\alpha}=k_{N}(z_{N})\xi_{N} and KNβ¯=kN(zN)vK_{N}\bar{\beta}=k_{N}(z_{N})v. Therefore,

ΠN[κ𝒵(,zN)ξN],ΠN[κ𝒵(,zN)v]𝒵=j=0N1κ𝒵(,zj)αj,j=0N1κ𝒵(,zj)βj𝒵=α¯,KNβ¯𝒴(N1)=KNkN(zN)ξN,KNKNkN(zN)v𝒴(N1)=ξN,kN(zN)KNkN(zN)v𝒴,\displaystyle\begin{aligned} &\left\langle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\alpha_{j},\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\beta_{j}\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\bar{\alpha},K_{N}\bar{\beta}\right\rangle_{{\cal Y}^{\oplus(N-1)}}\\ &=\left\langle K_{N}^{\dagger}k_{N}(z_{N})\xi_{N},K_{N}K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}^{\oplus(N-1)}}\\ &=\left\langle\xi_{N},k^{*}_{N}(z_{N})K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}},\end{aligned}

where the last line holds since KNKNKN=KNK_{N}^{\dagger}K_{N}K_{N}^{\dagger}=K_{N}^{\dagger}.

Putting everything together and using the definition of ΣN(zN)\Sigma_{N}(z_{N}), we arrive at

ΠN[κ𝒵(,zN)ξN],ΠN[κ𝒵(,zN)v]𝒵=ξN,κ𝒵(zN,zN)v𝒴ξN,kN(zN)KNkN(zN)v𝒴=ξN,ΣN(zN)v𝒴.\displaystyle\begin{aligned} &\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\xi_{N},\kappa_{\cal Z}(z_{N},z_{N})v\right\rangle_{{\cal Y}}-\left\langle\xi_{N},k^{*}_{N}(z_{N})K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}}\\ &=\left\langle\xi_{N},\Sigma_{N}(z_{N})v\right\rangle_{{\cal Y}}.\end{aligned}

Plugging the above relation back into (30) and using the fact that v𝒴v\in{\cal Y} is arbitrary and that ΣN(zN)\Sigma_{N}(z_{N}) is self-adjoint, we see that ξN\xi_{N} is determined by the relation

fN+1(zN)=fN(zN)+ΣN(zN)ξN.\displaystyle f_{N+1}(z_{N})=f_{N}(z_{N})+\Sigma_{N}(z_{N})\xi_{N}.

As fN+1f_{N+1} interpolates (zN,yN+)\left(z_{N},y_{N^{+}}\right), i.e., yN+=fN+1(zN)y_{N^{+}}=f_{N+1}(z_{N}), we can further write

ΣN(zN)ξN=yN+fN(zN).\displaystyle\Sigma_{N}(z_{N})\xi_{N}=y_{N^{+}}-f_{N}(z_{N}). (31)

Hence, when ΣN(zN)=0\Sigma_{N}(z_{N})=0, we will have fN(zN)=fN+1(zN)=yN+f_{N}(z_{N})=f_{N+1}(z_{N})=y_{N^{+}}.

Next, applying the Pythagorean theorem in (29), we have

fN+1𝒵2=fN𝒵2+ΠN[κ𝒵(,zN)ξt]𝒵2=fN𝒵2+ξN,ΣN(zN)ξN𝒴,\displaystyle\begin{aligned} \left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}=&\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}+\left\|\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{t}]\right\|_{{\cal H}_{\cal Z}}^{2}\\ =&\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}+\left\langle\xi_{N},\Sigma_{N}(z_{N})\xi_{N}\right\rangle_{{\cal Y}},\end{aligned}

where the last line follows from the definition of ΣN\Sigma_{N}. When ΣN(zN)0\Sigma_{N}(z_{N})\succ 0, from (31) we have

fN+1𝒵2fN𝒵2=ΣN1(zN)(yN+fN(zN)),yN+fN(zN)𝒴=ΣN1/2(zN)(yN+fN(zN))𝒴2.\displaystyle\begin{aligned} &\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\\ &=\left\langle\Sigma_{N}^{-1}(z_{N})\left(y_{N^{+}}-f_{N}(z_{N})\right),y_{N^{+}}-f_{N}(z_{N})\right\rangle_{{\cal Y}}\\ &=\left\|\Sigma_{N}^{-1/2}(z_{N})\left(y_{N^{+}}-f_{N}(z_{N})\right)\right\|^{2}_{{\cal Y}}.\end{aligned}

This concludes the proof.

Appendix C Proofs For Results in Section V

C-A Proof of Theorem 9

In our model (22), we can view v=ϕ(u)v=\phi(u) as the qq-dimensional input to the LTI system parametrized by (A,B,C,D)(A,B,C,D). Consequently, the argument in the proof in [22, Theorem 1] extends directly to our setting and guarantees that the matrix

:=[UpϕUfϕX0](2Lq+n)×(T2L+1)\displaystyle{\cal M}:=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {U_{\rm{f}}^{\phi}}\\ X_{0}\end{bmatrix}\in\mathbb{R}^{(2Lq+n)\times(T-2L+1)}

has full row rank.

()(\Rightarrow): Suppose that [Φ0:2L1y0:2L1]\begin{bmatrix}{\Phi}_{0:2L-1}\\ {y}_{0:2L-1}\end{bmatrix} is a valid input-output trajectory of (22). Then there exists an initial state x^0\widehat{x}_{0} such that

y0:L1=𝒪Lx^0+𝒯LΦ0:L1.\displaystyle{y}_{0:L-1}={\cal O}_{L}\widehat{x}_{0}+{\cal T}_{L}{\Phi}_{0:L-1}.

Since {\cal M} has full row rank, for any [Φ0:L1ΦL:2L1x^0]2Lqn\begin{bmatrix}{\Phi}_{0:L-1}\\ {\Phi}_{L:2L-1}\\ \widehat{x}_{0}\end{bmatrix}\in\mathbb{R}^{2Lq}\oplus\mathbb{R}^{n}, there exists some ξT2L+1\xi\in\mathbb{R}^{T-2L+1} such that

[Φ0:L1ΦL:2L1x^0]=ξ.\displaystyle\begin{bmatrix}{\Phi}_{0:L-1}\\ {\Phi}_{L:2L-1}\\ \widehat{x}_{0}\end{bmatrix}={\cal M}\xi.

Thus, we have

[Φ0:L1y0:L1]=[I0𝒯L𝒪L][Φ0:L1x^0]=[I0𝒯L𝒪L][UpϕX0]ξ=[UpϕYp]ξ.\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{\Phi}_{0:L-1}\\ \widehat{x}_{0}\end{bmatrix}\\ =&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ X_{0}\end{bmatrix}\xi\\ =&\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi.\end{aligned}

Moreover, from the dynamics (22), we have

Xf=ALX0+ΔLUpϕ.\displaystyle X_{\rm{f}}=A^{L}X_{0}+\Delta_{L}{U_{\rm{p}}^{\phi}}. (32)

Hence, at time t=Lt=L,

x^L=ALx^0+ΔLΦ0:L1=(ALX0+ΔLUpϕ)ξ=Xfξ.\displaystyle\begin{aligned} \widehat{x}_{L}=&A^{L}\widehat{x}_{0}+\Delta_{L}{\Phi}_{0:L-1}\\ =&\left(A^{L}X_{0}+\Delta_{L}{U_{\rm{p}}^{\phi}}\right)\xi\\ =&X_{\rm{f}}\xi.\end{aligned}

Thus, starting from x^L=Xfξ\widehat{x}_{L}=X_{\rm{f}}\xi with input sequence ΦL:2L1=Ufϕξ{\Phi}_{L:2L-1}={U_{\rm{f}}^{\phi}}\xi, we have

[ΦL:2L1yL:2L1]=[I0𝒯L𝒪L][ΦL:2L1x^L]=[I0𝒯L𝒪L][UfϕXf]ξ=[UfϕYf]ξ.\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{\Phi}_{L:2L-1}\\ \widehat{x}_{L}\end{bmatrix}\\ =&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ X_{\rm{f}}\end{bmatrix}\xi\\ =&\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ {Y_{\rm{f}}}\end{bmatrix}\xi.\end{aligned}

()(\Leftarrow) Since [u0:L1y0:L1]=[UpϕYp]ξ\begin{bmatrix}{u}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi, and [UpϕYp]\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix} is constrained by (22), we have

[Φ0:L1y0:L1]=[UpϕYp]ξ=[I0𝒯L𝒪L][UpϕX0]ξ.\displaystyle\begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi=\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ X_{0}\end{bmatrix}\xi.

Hence,

y0:L1=Ypξ=𝒪LX0ξ+𝒯LUpϕξ\displaystyle{y}_{0:L-1}={Y_{\rm{p}}}\xi={\cal O}_{L}X_{0}\xi+{\cal T}_{L}{U_{\rm{p}}^{\phi}}\xi

Since Φ0:L1=Upϕξ{\Phi}_{0:L-1}={U_{\rm{p}}^{\phi}}\xi, we conclude that y0:L1{y}_{0:L-1} is the output from x^0:=X0ξ\widehat{x}_{0}:=X_{0}\xi with input u0:L1{u}_{0:L-1}. As (Φ0:L1,y0:L1)\left({\Phi}_{0:L-1},{y}_{0:L-1}\right) is a valid input-output trajectory, we can use (22) and XfX_{\rm{f}} defined in (32) to obtain the state vector at time LL as

x^L=ALx^0+ΔLΦ0:L1=ALX0ξ+ΔLUpϕξ=Xfξ.\displaystyle\widehat{x}_{L}=A^{L}\widehat{x}_{0}+\Delta_{L}{\Phi}_{0:L-1}=A^{L}X_{0}\xi+\Delta_{L}{U_{\rm{p}}^{\phi}}\xi=X_{\rm{f}}\xi. (33)

For the second segment, using the assumptions and the matrix input-output relation for Ufϕ{U_{\rm{f}}^{\phi}}, Yf{Y_{\rm{f}}}, we have

[ΦL:2L1yL:2L1]=[UfϕYf]ξ=[I0𝒯L𝒪L][UfϕXf]ξ.\displaystyle\begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ {Y_{\rm{f}}}\end{bmatrix}\xi=\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ X_{\rm{f}}\end{bmatrix}\xi.

Thus, we have

yL:2L1=Yfξ=𝒪LXfξ+𝒯LUfϕξ=𝒪Lx^L+𝒯LΦL:2L1,\displaystyle\begin{aligned} {y}_{L:2L-1}&={Y_{\rm{f}}}\xi\\ &={\cal O}_{L}X_{\rm{f}}\xi+{\cal T}_{L}{U_{\rm{f}}^{\phi}}\xi\\ &={\cal O}_{L}\widehat{x}_{L}+{\cal T}_{L}{\Phi}_{L:2L-1},\end{aligned}

where the last equality follows from (33). Therefore, we conclude that yL:2L1{y}_{L:2L-1} is the output from x^L=Xfξ\widehat{x}_{L}=X_{\rm{f}}\xi with input ΦL:2L1{\Phi}_{L:2L-1}.

Putting everything together, we have

[Φ0:L1y0:L1]=[UpϕYp]ξ,[ΦL:2L1yL:2L1]=[UfϕYf]ξ.\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{{\rm{p}}}^{\phi}}\\ {Y_{{\rm{p}}}}\end{bmatrix}\xi,\quad\begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=\begin{bmatrix}{U_{{\rm{f}}}^{\phi}}\\ {Y_{{\rm{f}}}}\end{bmatrix}\xi.\end{aligned}

Multiplying both sides of the first equation by [Upϕ00Yp]\begin{bmatrix}{U_{{\rm{p}}}^{\phi}}^{*}&0\\ 0&Y_{{\rm{p}}}^{*}\end{bmatrix} and the second equation by [Ufϕ00Yf]\begin{bmatrix}{U_{{\rm{f}}}^{\phi}}^{*}&0\\ 0&Y_{{\rm{f}}}^{*}\end{bmatrix} proves the claim.

References

  • [1] M. Alsalti, J. Berberich, V. G. Lopez, F. Allgöwer, and M. A. Müller (2021) Data-based system analysis and control of flat nonlinear systems. In 2021 60th IEEE Conference on Decision and Control (CDC), pp. 1484–1489. Cited by: §I.
  • [2] N. Aronszajn (1950) Theory of reproducing kernels. Transactions of the American mathematical society 68 (3), pp. 337–404. Cited by: §II-C.
  • [3] J. Berberich and F. Allgöwer (2020) A trajectory-based framework for data-driven system analysis and control. In 2020 European Control Conference (ECC), pp. 1365–1370. Cited by: §I.
  • [4] A. Berlinet and C. Thomas-Agnan (2011) Reproducing kernel hilbert spaces in probability and statistics. Springer Science & Business Media. Cited by: §II-C.
  • [5] C. Carmeli, E. De Vito, and A. Toigo (2006-10) Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Analysis and Applications 04 (04), pp. 377–408. Cited by: §II-C, §V-A.
  • [6] J. Coulson, J. Lygeros, and F. Dörfler (2019) Data-enabled predictive control: in the shallows of the deepc. In 2019 18th European control conference (ECC), pp. 307–312. Cited by: §I.
  • [7] R. De Figueiredo and T. Dwyer (1980) A best approximation framework and implementation for simulation of large-scale nonlinear systems. IEEE Transactions on circuits and systems 27 (11), pp. 1005–1014. Cited by: §III-C2, §III-C2, §III-C2.
  • [8] F. Dörfler, J. Coulson, and I. Markovsky (2022) Bridging direct and indirect data-driven control formulations via regularizations and relaxations. IEEE Transactions on Automatic Control 68 (2), pp. 883–897. Cited by: §I.
  • [9] M. Green and J. B. Moore (1986) Persistence of excitation in linear systems. Systems & Control Letters 7 (5), pp. 351–360. Cited by: §III-A, §IV.
  • [10] L. Huang, J. Lygeros, and F. Dörfler (2023) Robust and kernelized data-enabled predictive control for nonlinear systems. IEEE Transactions on Control Systems Technology 32 (2), pp. 611–624. Cited by: §I.
  • [11] T. Liang and B. Recht (2023) Interpolating classifiers make few mistakes. Journal of Machine Learning Research 24 (20), pp. 1–27. Cited by: §IV, §IV.
  • [12] I. Markovsky and F. Dörfler (2022) Data-driven dynamic interpolation and approximation. Automatica 135, pp. 110008. Cited by: §I.
  • [13] I. Markovsky and P. Rapisarda (2008) Data-driven simulation and control. International Journal of Control 81 (12), pp. 1946–1959. Cited by: §I, §I, §V-B.
  • [14] I. Markovsky, J. C. Willems, S. Van Huffel, and B. De Moor (2006) Exact and approximate modeling of linear systems: a behavioral approach. SIAM. Cited by: §I.
  • [15] C. A. Micchelli and M. Pontil (2005) On learning vector-valued functions. Neural computation 17 (1), pp. 177–204. Cited by: §IV, §IV.
  • [16] O. Molodchyk and T. Faulwasser (2024) Exploring the links between the fundamental lemma and kernel regression. IEEE Control Systems Letters 8, pp. 2045–2050. Cited by: §I, §I, §I.
  • [17] J. Moore (1983) Persistence of excitation in extended least squares. IEEE Transactions on Automatic Control 28 (1), pp. 60–68. Cited by: §III-A, §IV.
  • [18] J. G. Rueda-Escobedo and J. Schiffer (2020) Data-driven internal model control of second-order discrete volterra systems. In 2020 59th IEEE Conference on Decision and Control (CDC), pp. 4572–4579. Cited by: §I.
  • [19] B. Schölkopf, R. Herbrich, and A. J. Smola (2001) A generalized representer theorem. In International conference on computational learning theory, pp. 416–426. Cited by: §I.
  • [20] X. Shang, J. Cortés, and Y. Zheng (2024) Willems’ fundamental lemma for nonlinear systems with koopman linear embedding. IEEE Control Systems Letters. Cited by: §I.
  • [21] P. Van Overschee and B. De Moor (1996) Subspace identification for linear systems: theory—implementation—applications. Kluwer. Cited by: §II-A, §V-B, §V-B, §V.
  • [22] H. J. Van Waarde, C. De Persis, M. K. Camlibel, and P. Tesi (2020) Willems’ fundamental lemma for state-space systems and its extension to multiple datasets. IEEE Control Systems Letters 4 (3), pp. 602–607. Cited by: §C-A.
  • [23] H. J. Van Waarde, J. Eising, M. K. Camlibel, and H. L. Trentelman (2023) The informativity approach to data-driven analysis and control. IEEE Control Systems Magazine 43 (6), pp. 32–66. Cited by: §I.
  • [24] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor (2005) A note on persistency of excitation. Systems & Control Letters 54 (4), pp. 325–329. Cited by: §I, §II-B.
  • [25] J. C. Willems (1986) From time series to linear system—part i. finite dimensional linear time invariant systems. Automatica 22 (5), pp. 561–580. Cited by: §I, §II-B, §II-B.
  • [26] J. C. Willems (1991) Paradigms and puzzles in the theory of dynamical systems. IEEE Transactions on Automatic Control 36 (3), pp. 259–294. Cited by: §I, §II-B.

Comments

· 0
Be the first to comment on this paper.