[2605.07052] A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

Boya Hou Maxim Raginsky This work was supported in part by the NSF under award CCF-2348624 (”Towards a control framework for neural generative modeling”). Boya Hou is with the Carl R. Woese Institute for Genomic Biology and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL, 61801 USA boyahou2@illinois.edu Maxim Raginsky is with Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL, 61801 USA maxim@illinois.edu

Abstract

We generalize Jan Willems’ behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.

I Introduction

The mathematical study of dynamical systems seeks to describe the evolution of a given system over time under its governing laws, initial conditions, and inputs (if any). This evolution can be studied internally through state-space methods or externally via the system’s behavior. Pioneered by Jan Willems in a series of papers starting with [25] (see [26] for an overview), the behavioral approach identifies dynamical systems with sets of their trajectories. For linear time-invariant systems, the cornerstone of this framework is the so-called fundamental lemma [24], which asserts that the set of all trajectories of a controllable linear time-invariant system over a finite time horizon can be reconstructed from a single trajectory driven by a persistently exciting input. This result has become central to recent advances of data-driven control; see, e.g., [14, 13, 6, 8, 23] and references therein.

The philosophy underlying these data-driven approaches, as clearly articulated in the paper by Markovsky and Rapisarda [13], is that simulation or control should be carried out without an explicit system identification step. In the context of simulation of linear systems, for example, this principle is supported by the fact that, given an initial condition and a subsequent input of interest, one can combine it with previous measurements of the system behavior to predict the resulting output without identifying the system transfer function, impulse response, or state-space model. Rather, simulation can be viewed as a “missing data” problem, to be solved directly using the available trajectories.

While this viewpoint has led to a rich and mature literature for linear systems, extending it to nonlinear systems is substantially more challenging, as the trajectory sets no longer form linear subspaces. Several works have explored nonlinear analogues of the fundamental lemma for structured nonlinear systems. Examples include bilinear systems [12], second-order Volterra systems [18], flat nonlinear systems [1], Hammerstein and Wiener systems [3], and a Koopman-type linear embedding posited in [20]. A promising direction is to use a reproducing kernel Hilbert space (RKHS) to lift the trajectories of nonlinear systems into a (possibly infinite-dimensional) feature space. In this vein, Huang, Lygeros, and Dörfler [10] investigated kernelized data-enabled predictive control, where the resulting models are regression-based approximations rather than behavioral characterizations. More recently, Molodchyk and Faulwasser [16] established the link between kernel regression and Willems’ fundamental lemma, and showed that several existing nonlinear extensions of the fundamental lemma, such as those for Hammerstein and flat systems, can be interpreted as instances of kernel regression with specific choices of kernel when the induced RKHS is finite-dimensional.

In this paper, we extend the behavioral framework beyond the linear settings for characterizing the behavior of nonlinear systems in RKHS via minimum-norm interpolation and subspace identification repurposed for nonlinear systems. We reinterpret both minimum-norm interpolation and subspace identification through the behavioral lens, and clarify the role of various structural assumptions on the system, the offline and online data, and the space of functions used to represent the nonlinear aspects of the system. A unifying theme across these two methods is that, like in the modern data-driven schemes and in the spirit of the fundamental lemma, predictors and the desired “online” trajectories can be expressed in terms of the kernelized “offline” examples. When working in an RKHS, as noted by Molodchyk and Faulwasser [16], this structure further mirrors the representer theorem [19]: the solution can be written as a finite linear combination of kernel evaluations along the observed trajectories.

The remainder of this paper is organized as follows. In Section II, we recall the fundamental lemma of Willems for linear time-invariant systems and introduce basic concepts regarding scalar-valued and vector-valued RKHSs. In Section III, we introduce a class of nonlinear systems whose nonlinearity is encoded by an element in the vector-valued RKHS of functions on the set of finite-length sequences of inputs. In Section IV, we establish a Behavior Representer Theorem for nonlinear systems using minimum-norm interpolation and contrast it with the fundamental lemma. Compared with the work of Molodchyk and Faulwasser [16], we consider minimum-norm interpolation instead of (regularized) least-squares regression in a vector-valued RKHS. We further establish an error estimate that holds with equality as in Lemma 4, rather than the inequaliity in [16, Lemma 2]. In Section V, we present a subspace identification framework for nonlinear systems in an RKHS and provide a fundamental lemma characterization of finite-length trajectories with kernelized input vectors and outputs.

II Preliminaries

II-A Notation and definitions

We will make use of the following notation and definitions throughout the paper. We will use $\mathbb{Z}_{+}$ to denote the set of nonnegative integers. The Moore–Penrose pseudoinverse of a matrix $A$ will be denoted by $A^{\dagger}$ . Given the matrices $A\in\mathbb{R}^{p\times k}$ , $B\in\mathbb{R}^{q\times k}$ , and $C\in\mathbb{R}^{r\times k}$ , the oblique projection of the rowspace of $A$ on the rowspace of $C$ along the rowspace of $B$ is defined as

\displaystyle A\underset{B}{/}C:=A\begin{bmatrix}C^{\hbox{\it\tiny T}}&B^{\hbox{\it\tiny T}}\end{bmatrix}\left(\begin{bmatrix}CC^{\hbox{\it\tiny T}}&CB^{\hbox{\it\tiny T}}\\ BC^{\hbox{\it\tiny T}}&BB^{\hbox{\it\tiny T}}\end{bmatrix}^{\dagger}\right)_{\text{first $r$ columns}}C

(see [21, Section 1.4.2]). Given a finite sequence of vectors $w_{0:T-1}=\{w_{t}\}^{T-1}_{t=0}$ in $\mathbb{R}^{q}$ , the Hankel matrix of depth $L$ is the $(Lq)\times(T-L+1)$ matrix given by

\displaystyle H_{L}(w_{0:T-1}):=\begin{bmatrix}w_{0}&w_{1}&\dots&w_{T-L}\\ w_{1}&w_{2}&\dots&w_{T-L+1}\\ \vdots&\vdots&\ddots&\vdots\\ w_{L-1}&w_{L}&\dots&w_{T-1}\end{bmatrix}.

We say that $w_{0:T-1}$ is persistently exciting (PE) of order $L$ if the Hankel matrix $H_{L}(w_{0:T-1})$ has full row rank. Given a signal (discrete-time vector-valued sequence) $\{w_{t}\}_{t\in\mathbb{Z}_{+}}$ , $\sigma$ is the backward shift operator defined by $(\sigma w)_{t}:=w_{t+1}$ . Additional notation will be introduced as needed.

II-B A review of the fundamental lemma

In the behavioral framework, a linear time-invariant system with $q$ variables is identified with a linear subspace ${\cal B}$ of the space of all one-sided sequences of $q$ -dimensional real vectors denoted as $(\mathbb{R}^{q})^{\mathbb{Z}_{+}}$ which is shift-invariant, i.e., $\sigma{\cal B}\subseteq{\cal B}$ . One can work with various equivalent representations of ${\cal B}$ , such as autoregressive, input/output, or input/state/output representations [25]. Notions like controllability or observability can be defined solely in terms of set-theoretic properties of ${\cal B}$ , without referring to a particular representation [26].

One of these structural properties pertains to the partition of the variables of ${\cal B}$ into inputs and outputs. Without getting too much into technical details, the idea is that, up to a permutation of coordinates, each trajectory $w\in{\cal B}$ can be partitioned into an input trajectory $u:\mathbb{Z}_{+}\to\mathbb{R}^{m}$ and an output trajectory $y:\mathbb{Z}_{+}\to\mathbb{R}^{p}$ as $w=\begin{bmatrix}u\\ y\end{bmatrix}$ , such that the input is free in the sense that

\displaystyle\begin{aligned} &\left\{u:\mathbb{Z}_{+}\to\mathbb{R}^{m}\,\middle|\,\begin{bmatrix}u\\ y\end{bmatrix}\in{\cal B}\text{ for some }y:\mathbb{Z}_{+}\to\mathbb{R}^{p}\right\}\\ &\qquad\qquad\simeq(\mathbb{R}^{m})^{\mathbb{Z}_{+}}\end{aligned},

and the output is determined by the input, subject to the system laws and the initial condition. It can be shown that the number of inputs $m$ , the number of outputs $p$ , and the dimension $n$ of any minimal state space realization of ${\cal B}$ are system invariants that depend only on ${\cal B}$ and not on the particular representation of ${\cal B}$ [25].

Let ${\cal B}$ be a controllable linear time-invariant system with $m$ inputs, $p$ outputs, and minimum state dimension $n$ . The main result of [24], now commonly referred to as the fundamental lemma, is as follows:

Lemma 1.

For each $t=1,2,\dots$ , let ${\cal B}|_{t}$ denote the restriction of ${\cal B}$ to times $s\in\{0,\dots,t-1\}$ . Let a trajectory $w^{\rm{d}}_{0:T-1}\in{\cal B}|_{T}$ be given, such that the input part of $w^{\rm{d}}_{0:T-1}$ is persistently exciting of order $L+n$ . Then ${\cal B}|_{L}$ is equal to the column space of the Hankel matrix $H_{L}(w^{\rm{d}}_{0:T-1})$ .

The main message of Lemma 1 is that the length- $L$ behavior ${\cal B}|_{L}$ can be reconstructed, exactly and in a representation-independent manner, from a single input/output trajectory of length $T\geq L+(L+n)m-1$ , provided the input is persistently exciting and the system is controllable. This makes the fundamental lemma a key ingredient in data-driven approaches to system simulation and control.

II-C Reproducing Kernel Hilbert Spaces

In this paper, we make extensive use of vector-valued reproducing kernel Hilbert spaces. We start by describing the more familiar definition of a scalar-valued RKHS; we refer interested readers to [4] for details. Let $X$ be a set. A Hilbert space¹¹1Unless indicated otherwise, all Hilbert spaces in this paper are assumed to be defined over the reals. ${\cal H}$ with inner product $\langle\cdot,\cdot\rangle_{\cal H}$ is an RKHS on $X$ if its elements are functions from $X$ to $\mathbb{R}$ and if for each $x\in X$ there exists a positive constant $C_{x}$ , such that $|f(x)|\leq C_{x}\|f\|_{\cal H}$ for all $f\in{\cal H}$ . In other words, ${\cal H}$ is an RKHS on $X$ if the evaluation functional $\delta_{x}(f):=f(x)$ is bounded (hence continuous) for all $x\in X$ . To each RKHS, we can associate a reproducing kernel, i.e., a mapping $\kappa:X\times X\rightarrow\mathbb{R}$ that satisfies the positivity condition

\displaystyle\sum^{n}_{i,j=1}c_{i}c_{j}\kappa(x_{i},x_{j})\geq 0,

for all $n\in\mathbb{N}$ , $c_{1},\dots,c_{n}\in\mathbb{R}$ , and $x_{1},\dots,x_{n}\in X$ , such that $\kappa(\cdot,x)$ is an element of ${\cal H}$ for each $x\in X$ and the following property (called the reproducing kernel property holds:

\displaystyle f(x)=\langle f,\kappa(\cdot,x)\rangle_{\cal H},\qquad\text{for all }f\in{\cal H},x\in X.

Moreover, the set $\{\kappa(\cdot,x):x\in X\}$ is total in ${\cal H}$ (i.e., its linear span is dense in ${\cal H}$ ). The map $\phi:X\to{\cal H}$ defined by $\phi(x):=\kappa(\cdot,x)$ is called the canonical feature map. By the Moore–Aronszajn theorem [2], the reproducing kernel $\kappa$ specifies ${\cal H}$ uniquely up to linear isomorphism.

We now describe the vector-valued generalization of the above definition [5]. Let ${\cal K}$ be a Hilbert space, and let ${\cal L}({\cal K})$ denote the Hilbert space of bounded linear operators on ${\cal K}$ with the Hilbert–Schmidt norm. Then we say that ${\cal H}$ is a ${\cal K}$ -valued RKHS on $X$ if its elements are functions from $X$ to ${\cal K}$ and if the evaluation map $\delta_{x}(f):=f(x)$ from ${\cal H}$ to ${\cal K}$ is bounded (hence continuous) for all $x\in X$ , i.e., there exists a positive constant $C_{x}$ such that $\|f(x)\|_{\cal K}\leq C_{x}\|f\|_{\cal H}$ . The vector-valued analogue of the reproducing kernel is an operator-valued map $\kappa:X\times X\to{\cal L}({\cal K})$ of the positive type, i.e., such that the inequality

\displaystyle\sum^{n}_{i,j=1}c_{i}c_{j}\langle\kappa(x_{i},x_{j})v,v\rangle_{\cal K}\geq 0,

holds for all $n\in\mathbb{N}$ , $c_{1},\dots,c_{n}\in\mathbb{R}$ , $x_{1},\dots,x_{n}\in X$ , and $v\in{\cal K}$ . The reproducing kernel property then takes the form

\displaystyle\langle f(x),v\rangle_{\cal K}=\langle f,\kappa(\cdot,x)v\rangle_{\cal H},\text{ for all }f\in{\cal H},x\in X,v\in{\cal K}

where, for each $x\in X$ , $\kappa(\cdot,x)$ is an element of the linear space ${\cal L}({\cal K},{\cal H})$ of bounded operators from ${\cal K}$ to ${\cal H}$ . The map $x\mapsto\kappa(\cdot,x)$ from $X$ into ${\cal L}({\cal K},{\cal H})$ is the (operator-valued) canonical feature map.

A basic example of a vector-valued RKHS which will be used frequently in the sequel is ${\cal H}={\cal L}({\cal V},{\cal W})$ , where ${\cal V}$ and ${\cal W}$ are finite-dimensional inner-product spaces and where we equip ${\cal H}$ with the Hilbert–Schmidt inner product $\langle A,B\rangle_{\cal H}={\textrm{Tr}}(A^{*}B)$ . For the sake of completeness, we present the straightforward proof of the following lemma in Appendix A.

Lemma 2.

${\cal H}$ is a ${\cal W}$ -valued reproducing kernel Hilbert space on ${\cal V}$ with the operator-valued reproducing kernel $\kappa\left(v,v^{\prime}\right)=\langle v,v^{\prime}\rangle_{{\cal V}}I_{\cal W}$ , where $I_{\cal W}$ is the identity operator on ${\cal W}$ .

III Nonlinear Systems in a Vector-Valued RKHS

We now introduce a class of discrete-time nonlinear systems, where the nonlinearity is represented by an element of a given vector-valued RKHS of functions on the set of finite-length sequences of inputs. We will use the following notation throughout:

•

${\cal U}:=\mathbb{R}^{m}$ is the space of inputs;
•

${\cal Y}:=\mathbb{R}^{p}$ is the space of outputs;
•

${\sf{U}}:={\cal U}^{L+1}$ is the space of length- $(L+1)$ input sequences, where $L\in\mathbb{Z}_{+}$ is a fixed lag;
•

${\sf{Y}}:={\cal Y}^{L}$ is the space of length- $L$ output sequences;
•

${\cal Z}:=\sf{U}\times\sf{Y}$ .

We introduce the system model in Section III-A and the associated behavioral constructs in Section III-B, and close by discussing several examples in Section III-C.

III-A System model

Let ${\cal H}_{\sf{U}}$ be a vector-valued RKHS of functions from $\sf{U}$ into ${\cal Y}$ , with the operator-valued kernel $\kappa_{\sf{U}}$ of positive type.²²2Here and elsewhere, all finite-dimensional vector spaces are automatically treated as Hilbert spaces with the usual Euclidean inner product. We consider systems parametrized by $(L+1)$ -tuples $(A_{1},\dots,A_{L},g)$ , where $A_{1},\dots,A_{L}\in{\cal L}({\cal Y})$ and $g\in{\cal H}_{\sf{U}}$ . The input/output relation corresponding to $(A_{1},\dots,A_{L},g)$ is given by

\displaystyle y_{t+L}+\sum^{L}_{k=1}A_{k}y_{t+L-k}=g\left(u_{t:t+L}\right),\qquad t\in\mathbb{Z}_{+}

(1)

where $u_{t:t+L}$ is the restriction of the input sequence $u\in{\cal U}^{\mathbb{Z}_{+}}$ to the set $\{t,t+1,\dots,t+L\}$ . By the reproducing kernel property, we can also write

\displaystyle g(u_{t:t+L})=\kappa_{\sf{U}}(\cdot,u_{t:t+L})^{*}g,

where $\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)^{*}\in{\cal L}({\cal H}_{\sf{U}},{\cal Y})$ is the adjoint of the canonical feature map $\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)\in{\cal L}({\cal Y},{\cal H}_{\sf{U}})$ .

It is also convenient to cast (1) in an alternative nonlinear regression form. For each $t$ , let $y_{t^{+}}$ denote the output $y_{t+L}$ at time $t+L$ and define the regression vectors

\displaystyle z_{t}:=[u_{t}^{\hbox{\it\tiny T}},\cdots,u_{t+L}^{\hbox{\it\tiny T}},y_{t}^{\hbox{\it\tiny T}},\cdots y_{t+L-1}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}\in{\cal Z},

(2)

which is in analogy to the corresponding construct for linear systems [17, 9]. As we now show, we can introduce a ${\cal Y}$ -valued RKHS ${\cal H}_{{\cal Z}}$ of functions from ${\cal Z}$ into ${\cal Y}$ , such that the autoregressive nonlinear model of (1) can be represented as

\displaystyle y_{t^{+}}=f(z_{t}),

(3)

for some $f\in{\cal H}_{{\cal Z}}$ that depends on $(A_{1},\dots,A_{L},g)$ .

To that end, let us first express (1) as

\displaystyle\begin{aligned} y_{t+L}&=A^{-}y_{t:t+L-1}+g\left(u_{t:t+L}\right)\\ &=:f\left(u_{t:t+L},y_{t:t+L-1}\right),\end{aligned}

(4)

where $A^{-}\in{\cal L}({\sf{Y}},{\cal Y})$ is the linear operator defined by

\displaystyle A^{-}\overline{y}:=-\begin{bmatrix}A_{1}&A_{2}&\dots&A_{L}\end{bmatrix}\overline{y},\qquad\overline{y}\in{\sf{Y}}.

By Lemma 2, ${\cal H}_{\sf{Y}}={\cal L}({\sf{Y}},{\cal Y})$ is a ${\cal Y}$ -valued RKHS on ${\sf{Y}}$ with the reproducing kernel $\kappa_{\sf{Y}}\left(\overline{y},\overline{y}^{\prime}\right)=(\overline{y}^{\hbox{\it\tiny T}}\overline{y}^{\prime})I_{{\cal Y}}$ . Consider the direct-sum Hilbert space ${\cal H}_{{\cal Z}}={\cal H}_{\sf{U}}\oplus{\cal H}_{\sf{Y}}$ , whose elements $f:{{\cal Z}}\to{\cal Y}$ are pairs $(g,h)$ where $g\in{\cal H}_{\sf{U}}$ , $h\in{\cal H}_{\sf{Y}}$ , and for each $z=(\overline{u},\overline{y})\in{\cal Z}$ , $f(z)=g(\overline{u})+h(\overline{y})$ . The inner product in ${\cal H}_{{\cal Z}}$ is

\displaystyle\left\langle f_{1},f_{2}\right\rangle_{{\cal H}_{{\cal Z}}}=\left\langle g_{1},g_{2}\right\rangle_{{\cal H}_{\sf{U}}}+\left\langle h_{1},h_{2}\right\rangle_{{\cal H}_{\sf{Y}}},

for $f_{1}=(g_{1},h_{1})$ , $f_{2}=(g_{2},h_{2})$ . Since ${\cal H}_{\sf{U}}$ , ${\cal H}_{\sf{Y}}$ are ${\cal Y}$ -valued RKHSs with operator-valued kernels $\kappa_{\sf{U}}$ and $\kappa_{\sf{Y}}$ , ${\cal H}_{{\cal Z}}$ is also an RKHS with operator-valued kernel defined by

\displaystyle\begin{split}\kappa_{{\cal Z}}(z_{1},z_{2})&=\kappa_{\sf{U}}(\overline{u}_{1},\overline{u}_{2})+\kappa_{\sf{Y}}\left(\overline{y}_{1},\overline{y}_{2}\right),\\ &\qquad z_{1}=\left(\overline{u}_{1},\overline{y}_{1}\right),z_{2}=\left(\overline{u}_{2},\overline{y}_{2}\right)\in{\cal Z}.\end{split}

This can be proved as follows: for any $f\in{\cal H}_{{\cal Z}}$ and $v\in{\cal Y}$ , we have

\displaystyle\begin{aligned} \left\langle f(z),v\right\rangle_{{\cal Y}}=&\left\langle g(z),v\right\rangle_{{\cal Y}}+\left\langle h(z),v\right\rangle_{{\cal Y}}\\ =&\left\langle g,\kappa_{{\sf{U}}}(\cdot,\overline{u})v\right\rangle_{{\cal H}_{\sf{U}}}+\left\langle h,\kappa_{{\sf{Y}}}(\cdot,\overline{y})v\right\rangle_{{\cal H}_{\sf{Y}}}\\ =&\left\langle f,\left(\kappa_{{\sf{U}}}(\cdot,\overline{u})+\kappa_{{\sf{Y}}}(\cdot,\overline{y})\right)v\right\rangle_{{\cal H}_{{\cal Z}}},\end{aligned}

where the second line follows from the reproducing kernel property. In particular, (3) holds with the function $f$ defined in (4).

III-B Behaviors

In this section, we introduce several behavioral constructs related to the system model of Section III-A.

Given $(A_{1},\dots,A_{L},g)$ , define the operator $P(\sigma):=A_{0}\sigma^{L}+\sum^{L}_{k=1}A_{k}\sigma^{L-k}$ (with $A_{0}=I_{\cal Y}$ ), where $\sigma$ is the shift operator acting on output sequences. Then (1) is equivalent to

\displaystyle\left(P(\sigma)y\right)(t)=\kappa_{\sf{U}}\left(\cdot,u_{t:t+L}\right)^{*}g,\qquad t\in\mathbb{Z}_{+},

(5)

and we can define the behavior of $(A_{1},\dots,A_{L},g)$ as the following subset of the space of all input/output sequences:

\displaystyle\begin{aligned} &{\cal B}(A_{1},\dots,A_{L},g)\\ &\equiv{\cal B}(P,g)\\ &:=\left\{\begin{bmatrix}u\\ y\end{bmatrix}\in({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\,\middle|\,\eqref{eq:NL.AR.Poly}\text{ holds for all }t\in\mathbb{Z}_{+}\right\}.\end{aligned}

(6)

For $t\in\mathbb{Z}_{+}$ , the restriction of the behavior ${\cal B}(P,g)$ to the finite time interval $\{t,\dots,t+{L}\}$ is defined by

\displaystyle\begin{split}{\cal B}(P,g)|_{t:t+{L}}=\left\{\begin{bmatrix}u_{t:t+L}\\ y_{t:t+L}\end{bmatrix}\,\middle|\,\exists\begin{bmatrix}\bar{u}\\ \bar{y}\end{bmatrix}\in{\cal B}(P,g)\right.\\ \left.\text{ such that }\begin{bmatrix}\bar{u}_{t:t+{L}}\\ \bar{y}_{t:t+{L}}\end{bmatrix}=\begin{bmatrix}u_{t:t+L}\\ y_{t:t+L}\end{bmatrix}\right\}.\end{split}

When $t=0$ , we will use ${\cal B}(P,g)|_{L}$ as shorthand for ${\cal B}(P,g)|_{t:t+{L}}$ .

Next, we turn to the nonlinear regression form in (3). Recall that the identity

\displaystyle f(z)=\kappa_{{\cal Z}}(\cdot,z)^{*}f

holds by the reproducing kernel property. Define the operator $R:({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\to{\cal L}({\cal H}_{\cal Z},{\cal Y})\times{\cal Y}$ by

\displaystyle R\begin{bmatrix}u\\ y\end{bmatrix}:=(\kappa_{{\cal Z}}(\cdot,z_{0})^{*},y_{0^{+}}),

where

\displaystyle z_{0}=[u_{0}^{\hbox{\it\tiny T}},\cdots,u_{L}^{\hbox{\it\tiny T}},y_{0}^{\hbox{\it\tiny T}},\cdots y_{L-1}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}},\quad y_{0^{+}}=y_{L},

(7)

is the regression vector at time $t=0$ [cf. Eq. (2)] and $y_{0^{+}}$ is the output at time $t=L$ . We say that $(H,y)\in{\cal L}\left({\cal H}_{\cal Z},{\cal Y}\right)\times{\cal Y}$ is an input-output (i/o) pair of the system (3) if $Hf=y$ . This leads naturally to the following behavioral description:

\displaystyle\begin{aligned} {\cal B}(f)&:=\Bigg\{\begin{bmatrix}u\\ y\end{bmatrix}\in({\cal U}\times{\cal Y})^{\mathbb{Z}_{+}}\Bigg|R\sigma^{t}\begin{bmatrix}u\\ y\end{bmatrix}\\ &\quad\text{ is an i/o pair of \eqref{eq.NL.AR.f} for each $t\in\mathbb{Z}_{+}$}\Bigg\}.\end{aligned}

When $(A_{1},\dots,A_{L},g)$ and $f$ are related via (3), it is easily verified that

\displaystyle{\cal B}(A_{1},\dots,A_{L},g)|_{L}=\left\{\begin{bmatrix}u_{0:L}\\ y_{0:L}\end{bmatrix}\,\middle|\,\kappa_{\cal Z}(\cdot,z_{0})^{*}f=y_{0^{+}}\right\},

(8)

where $z_{0}$ and $y_{0^{+}}$ are defined in (7).

III-C Examples

III-C1 LTI Systems

Let ${\cal H}_{\sf{U}}$ consist of all linear operators from $\sf{U}$ into ${\cal Y}$ , i.e., ${\cal H}_{\sf{U}}={\cal L}(\sf{U},{\cal Y})$ . By Lemma 2, this is a ${\cal Y}$ -valued RKHS on $\sf{U}$ . Any $g\in{\cal H}_{\sf{U}}$ can be represented as $g(u_{0:L})=B_{L}u_{0}+\cdots+B_{0}u_{L}$ for some linear operators $B_{0},\dots,B_{L}\in{\cal L}({\cal U},{\cal Y})$ . Then Eq. (1) describes a linear autoregressive model of order ${L}$ :

\displaystyle y_{t+L}+\sum^{L}_{k=1}A_{k}y_{t+L-k}=\sum^{L}_{l=0}B_{l}u_{t+L-l}.

The operator-valued kernel map is given by

\displaystyle\kappa_{\sf{U}}(u_{0:L},v_{0:L})=\langle u_{0:L},v_{0:L}\rangle_{\sf{U}}I_{\cal Y},

where

\displaystyle\langle u_{0:L},v_{0:L}\rangle_{\sf{U}}=\sum^{L}_{l=0}u_{l}^{\hbox{\it\tiny T}}v_{l},

is the $l^{2}$ inner product on ${\sf{U}}$ viewed as a direct sum of $L+1$ Hilbert spaces ${\cal U}=\mathbb{R}^{m}$ with the Euclidean inner product $\langle u,v\rangle=u^{\hbox{\it\tiny T}}v$ .

III-C2 Volterra series

When $A_{k}=0$ for all $k=1,\cdots,{L}$ , the nonlinear autoregressive model in (1) reduces to

\displaystyle y_{t+L}=g(u_{t:t+L}),\quad\forall t\in\mathbb{N}.

(9)

This system has finite memory of length $L+1$ , since the output at each time $t\geq L$ is determined by the inputs in the finite time window $\{t,t-1,\dots,t-L\}$ of length $L+1$ . Systems of this type are often represented using Volterra series. De Figueiredo and Dwyer [7] used the formalism of weighted Fock spaces to connect Volterra series representation to the RKHS framework.

Let us assume for simplicity that $p=1$ (i.e., the outputs $y_{t}$ are scalar). Let $\rho=(\rho_{k})_{k\geq 0}$ be a sequence of positive reals, such that the infinite series

\displaystyle q(\lambda)=\sum^{\infty}_{k=0}\frac{1}{k!\rho_{k}}\lambda^{k},

converges for all $\lambda\in\mathbb{R}$ , and let ${\cal H}_{\sf{U}}$ denote the RKHS of functions from $\sf{U}$ into $\mathbb{R}$ with the reproducing kernel

\displaystyle\kappa_{\sf{U}}(u_{0:L},v_{0:L}):=q(\langle u_{0:L},v_{0:L}\rangle_{{\sf{U}}}).

Let $h_{0}$ and $(h_{k}(i_{1},\dots,i_{k}):k\in\mathbb{N},i_{1},\dots,i_{k}\in\{0,\dots,L\})$ be a collection of real coefficients satisfying the condition

\displaystyle\sum^{\infty}_{k=0}\frac{\rho_{k}}{k!}\|h_{k}\|^{2}_{k}<\infty,

where $\|h_{0}\|_{0}:=|h_{0}|$ and

\displaystyle\|h_{k}\|_{k}:=\left(\sum^{L}_{i_{1}=0}\dots\sum^{L}_{i_{k}=1}|h_{k}(i_{1},\dots,i_{k})|^{2}\right)^{1/2}

for $k=1,2,\dots$ . Then any function $g:\sf{U}\to\mathbb{R}$ of the form

\displaystyle g(u_{0:L})=h_{0}+\sum^{\infty}_{k=1}\sum_{i_{1},\dots,i_{k}\in\{0,\dots,L\}}h_{k}(i_{1},\dots,i_{k})\prod^{k}_{j=1}u_{i_{j}}

is an element of ${\cal H}_{\sf{U}}$ [7].

Removing the restriction $A_{1}=\dots=A_{L}=0$ yields a broader class of nonlinear systems that subsumes the models studied in [7].

III-C3 State-space models

Consider a Hammerstein state-space model of the form

\displaystyle\begin{aligned} x_{t+1}&=Ax_{t}+B\psi_{1}\left(u_{t}\right),\\ y_{t}&=Cx_{t}+D\psi_{2}\left(u_{t}\right),\end{aligned}

(10)

where $\psi_{1},\psi_{2}:{\cal U}\to\mathbb{R}^{q}$ are two (nonlinear) functions. We assume that the state $x_{t}$ takes values in $\mathbb{R}^{n}$ , so $A\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times q}$ , $C\in\mathbb{R}^{p\times n}$ , and $D\in\mathbb{R}^{p\times q}$ . We can obtain an input-output representation (1) from the state-space representation (10) as follows.

Introduce the Markov parameters $M_{j}:=CA^{j}B\in\mathbb{R}^{p\times q}$ for $j\geq 0$ . Define the $L$ -step controllablity matrix ${\cal C}_{L}$ , the $L$ -step observability matrix ${\cal O}_{L}$ , the reversed $L$ -step controllability matrix $\Delta_{L}$ , and the modified block Toeplitz operator $\widetilde{{\cal T}}_{L}$ as

\displaystyle\begin{aligned} {\cal C}_{L}:=&\begin{bmatrix}B&AB&A^{2}B&\cdots&A^{{L-1}}B\end{bmatrix},\\ {\cal O}_{L}:=&\begin{bmatrix}C\\ CA\\ \vdots\\ CA^{{L-1}}\end{bmatrix},\\ \Delta_{L}:=&\begin{bmatrix}A^{{L-1}}B&\cdots&AB&B\end{bmatrix},\\ \widetilde{{\cal T}}_{L}=&\begin{bmatrix}{0}&{0}&\cdots&{0}&{0}\\ M_{0}&{0}&\cdots&{0}&{0}\\ M_{1}&M_{0}&\cdots&{0}&{0}\\ \vdots&\vdots&\cdots&\vdots&\vdots\\ M_{L-2}&M_{L-2}&\cdots&M_{0}&{0}\end{bmatrix}.\end{aligned}

(11)

Let us assume that $L\geq n$ is such that ${\rm rank}({\cal O}_{L})=n$ . Then from (10), for $k=0,\cdots,{L}$ , we have

\displaystyle\begin{aligned} y_{t-L+k}=&CA^{k}x_{t-L}+\sum_{j=0}^{k-1}M_{j}\psi_{1}(u_{t-L+k-1-j})\\ &+D\psi_{2}(u_{t-L+k}).\end{aligned}

(12)

Define the vectors

	$\displaystyle Y_{t}$	$\displaystyle:=\begin{bmatrix}y_{t-L}\\ \vdots\\ y_{t-1}\end{bmatrix}\in\mathbb{R}^{pL},$
	$\displaystyle E_{t}$	$\displaystyle:=\begin{bmatrix}\psi_{1}(u_{t-L})\\ \vdots\\ \psi_{1}(u_{t-1})\end{bmatrix}\in\mathbb{R}^{qL},$
	$\displaystyle F_{t}$	$\displaystyle:=\begin{bmatrix}\psi_{2}(u_{t-L})\\ \vdots\\ \psi_{2}(u_{t-1})\end{bmatrix}\in\mathbb{R}^{qL}.$

Then, stacking the equations in (12) for $k=0$ to $k=L-1$ , we have

\displaystyle Y_{t}={\cal O}_{L}x_{t-L}+\widetilde{{\cal T}}_{L}E_{t}+\left(I_{L}\otimes D\right)F_{t}.

Since ${\rm rank}({\cal O}_{L})=n$ , we can solve for $x_{t-L}$ :

\displaystyle x_{t-L}={\cal O}_{L}^{\dagger}\left(Y_{t}-\widetilde{{\cal T}}_{L}E_{t}-\left(I_{L}\otimes D\right)F_{t}\right).

(13)

On the other hand, when $k={L}$ , we have

\displaystyle\begin{aligned} y_{t}&=CA^{L}x_{t-L}+\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})+D\psi_{2}(u_{t}).\end{aligned}

Plugging in the expression for $x_{t-L}$ from (13),

\displaystyle\begin{aligned} y_{t}&=CA^{L}{\cal O}_{L}^{\dagger}\left(Y_{t}-\widetilde{{\cal T}}_{L}E_{t}-\left(I_{L}\otimes D\right)F_{t}\right)\\ &\qquad+\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})+D\psi_{2}(u_{t}).\end{aligned}

The matrix $Q:=CA^{L}{\cal O}_{L}^{\dagger}\in\mathbb{R}^{p\times pL}$ can be written in block form as $Q=[Q_{L-1},\cdots,Q_{0}]$ with $Q_{i}\in\mathbb{R}^{p\times p}$ for $i=0,\cdots,{L-1}$ . Setting $A_{k}:=-Q_{k}$ for $k=1,\cdots,L-1$ and moving them to the LHS of the above equation, we have

\displaystyle\begin{aligned} y_{t}+\sum_{k=1}^{L}A_{k}y_{t-k}=&\sum_{j=0}^{{L-1}}M_{j}\psi_{1}(u_{t-1-j})-Q\widetilde{{\cal T}}_{L}E_{t}\\ &-Q\left(I_{L}\otimes D\right)F_{t}+D\psi_{2}(u_{t}).\end{aligned}

With $M=[M_{{L-1}},\cdots M_{0}]\in\mathbb{R}^{p\times mL}$ , we can write the RHS as a function of on $u_{t-L:t}$ as

\displaystyle\begin{aligned} &y_{t}+\sum_{k=1}^{L}A_{k}y_{t-k}\\ &=\underbrace{\begin{bmatrix}M-Q\widetilde{{\cal T}}_{L}&-Q\left(I_{L}\otimes D\right)&D\end{bmatrix}}_{=:S}\underbrace{\begin{bmatrix}E_{t}\\ F_{t}\\ \psi_{2}(u_{t})\end{bmatrix}}_{=:\psi(u_{t-L:t})}\\ &=:g(u_{t-L:t}),\end{aligned}

where $S\in\mathbb{R}^{p\times q(2L+1)}$ , $\psi(u_{t-L:t})\in\mathbb{R}^{q(2L+1)}$ , and $g$ defined above is a mapping from $\sf{U}$ into ${\cal Y}=\mathbb{R}^{p}$ . Assume now that $\psi_{1},\psi_{2}$ are elements of some $\mathbb{R}^{q}$ -valued RKHS ${\cal H}$ on ${\cal U}$ . Thus, by definition, the evaluation maps $\psi_{1}\mapsto\psi_{1}(u)$ and $\psi_{2}\mapsto\psi_{2}(u)$ are bounded for each $u\in{\cal U}$ . Consider the direct-sum Hilbert space ${\cal K}:={\cal H}^{\oplus(2L+1)}$ with inner product $\langle h,h^{\prime}\rangle_{{\cal H}^{\oplus(2L+1)}}=\sum_{j=0}^{2L}\langle h_{j},h^{\prime}_{j}\rangle_{{\cal H}}$ for $h=h_{0}\oplus\cdots\oplus h_{2L}$ . The function $g$ defined above is an element of the linear space ${\cal F}$ of functions $f:\sf{U}\to{\cal Y}$ of the form

\displaystyle f(u_{0:L})=S_{0}h_{0}(u_{0})+\sum^{L}_{j=1}S_{j}h_{j}(u_{j})+\sum^{2L}_{j=L+1}S_{j}h_{j}(u_{j-L}),

where $h=h_{0}\oplus\dots\oplus h_{2L}$ ranges over ${\cal K}$ and where $S_{0},S_{1},\dots,S_{2L}$ are linear operators from $\mathbb{R}^{q}$ into ${\cal Y}=\mathbb{R}^{p}$ . For each $u_{0:L}$ , the evaluation map $f\mapsto f(u_{0:L})$ is continuous on ${\cal F}$ . Thus, $g$ belongs to some ${\cal Y}$ -valued RKHS on $\sf{U}$ .

IV Behavior Representer Theorem Via Minimum Norm Interpolation

Let $(u_{t}^{\rm{d}},y_{t}^{\rm{d}})_{t=0}^{T+L-1}$ be a finite input-output trajectory generated by an unknown system of the form (1). In other words, there exists an unknown $(L+1)$ -tuple $(A_{1},\dots,A_{L},g)$ , such that

\displaystyle\begin{bmatrix}u^{\rm{d}}_{0:T+L-1}\\ y^{\rm{d}}_{0:T+L-1}\end{bmatrix}\in{\cal B}(A_{1},\dots,A_{L},g)|_{T+L-1}.

The problem of data-driven behavioral modeling is to reconstruct the unknown length- $L$ behavior segment ${\cal B}(A_{1},\dots,A_{L},g)|_{L}$ from the data without explicitly estimating $A_{1},\dots,A_{L},g$ .

Let $f$ be the regression representation of $(A_{1},\dots,A_{L},g)$ as in (3). Using the definitions of $z_{t}$ in (2) and $y_{t^{+}}:=y_{t+L}$ , we can represent the data $(u^{\rm{d}}_{t},y^{\rm{d}}_{t})^{T+L-1}_{t=0}$ equivalently by $(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})^{T-1}_{t=0}$ , such that $f(z^{\rm{d}}_{t})=y_{t^{+}}$ holds for all $t=0,\dots,T-1$ . In other words, for each $t=0,\dots,T-1$ , $(\kappa_{\cal Z}(\cdot,z^{\rm{d}}_{t})^{*},y^{\rm{d}}_{t^{+}})$ is a valid i/o pair for (3). In view of the equivalence (8), the question we would like to answer is whether we can reconstruct the set of all valid i/o pairs of the unknown $f$ from the given data.

The following result is easy to establish:

Theorem 3.

For any collection of real coefficients $c_{0},\dots,c_{T-1}$ , $\left(\sum_{j=0}^{T-1}c_{j}\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})^{*},\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}}\right)$ is a valid i/o pair for the model (3).

Theorem 3 shows that any linear combination of $\big(\kappa_{\cal Z}(\cdot,z^{\rm{d}}_{t})^{*},y^{\rm{d}}_{t^{+}}\big)^{T-1}_{t=0}$ is a valid i/o pair of (3). We now analyze the reverse direction by relating it to minimum-norm interpolation in a vector-valued RKHS [15]. Let finite input-output data $\{(z_{i},y_{i})\}^{N-1}_{i=0}\subset{\cal Z}\times{\cal Y}$ be given and consider the following problem:

\displaystyle\begin{aligned} \text{minimize }&\left\|f\right\|_{{\cal H}_{\cal Z}}^{2}\\ \text{subject to }&\ f\left(z_{j}\right)=y_{j},\quad j=0,\cdots,{N-1}.\end{aligned}

(14)

Define the sampling operator $S_{N}:{\cal H}_{\cal Z}\to{\cal Y}^{N}$ by $S_{N}f=\left(f(z_{0}),\cdots,f(z_{N-1})\right)$ . By [15, Theorem 3], if $(y_{0},\dots,y_{N-1})\in\text{range}(S_{N})$ , then the minimum norm interpolation problem in (14) has a unique solution given by

\displaystyle f_{N}=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})v_{j},

(15)

where $(v_{0},\dots,v_{N-1})\in{\cal Y}^{N}$ solves the system of equations

\displaystyle\sum_{l=0}^{N-1}\kappa_{\cal Z}(z_{j},z_{l})v_{l}=y_{j},\qquad j=0,\dots,N-1.

(16)

We can express this more succinctly as follows.

Since ${\cal Y}=\mathbb{R}^{p}$ , for each $z\in{\cal Z}$ we can view $\kappa_{\cal Z}(\cdot,z)$ as a mapping from $\mathbb{R}^{p}$ into ${\cal H}_{\cal Z}$ , i.e., for each $v\in\mathbb{R}^{p}$ , $\kappa_{\cal Z}(\cdot,z)v\in{\cal H}_{\cal Z}$ . Define the mapping $\Phi_{N}:\mathbb{R}^{pN}\to{\cal H}_{\cal Z}$ as

\displaystyle\Phi_{N}\bar{v}=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})v_{j},

where $\bar{v}=(v_{0},\cdots,v_{N-1})\in\mathbb{R}^{pN}$ . In particular, $f_{N}=\Phi_{N}\bar{v}$ , where $\bar{v}$ is the solution of (16). Next, define the block kernel matrix $K_{N}:=\Phi_{N}^{*}\Phi_{N}\in\mathbb{R}^{pN\times pN}$ whose blocks are given by $[K_{N}]_{ij}=\kappa_{\cal Z}(z_{i},z_{j})$ for $i,j=0,\cdots,N-1$ , as well as the block row vector $k_{N}(z):=\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\in\mathbb{R}^{p\times pN}$ with blocks $[k_{N}(z)]_{j}=\kappa_{\cal Z}(z,z_{j})$ for $j=0,\cdots,N-1$ . Using the reproducing property and the above definitions, we can write

$\displaystyle f_{N}(z)$	$\displaystyle=\kappa_{\cal Z}(\cdot,z)^{*}f_{N}$
	$\displaystyle=\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}K^{\dagger}_{N}y_{N}$
	$\displaystyle=k_{N}(z)K^{\dagger}_{N}Y_{N},$	(17)

where $Y_{N}:=[y^{\hbox{\it\tiny T}}_{0},\dots,y^{\hbox{\it\tiny T}}_{N-1}]^{\hbox{\it\tiny T}}$ . Finally, for each $z$ define $\Sigma_{N}(z)\in\mathbb{R}^{p\times p}$ as

\displaystyle\Sigma_{N}(z):=\kappa_{\cal Z}(z,z)-k_{N}(z)K_{N}^{\dagger}{k}_{N}(z)^{*}.

(18)

We now present two key lemmas. The first one is a structural characterization of $\Sigma_{N}$ :

Lemma 4.

For each $z$ , the operator $\Sigma_{N}(z)$ is symmetric and positive semi-definite, and $\Sigma_{N}(z)=0$ iff ${\rm range}\left(\kappa_{\cal Z}(\cdot,z)\right)\subseteq{{\rm range}(\Phi_{N})}$ .

The second one is an extension of a result of Liang and Recht [11, Lemma 1] to the minimum-norm interpolation problem in the vector-valued RKHS ${\cal H}_{\cal Z}$ :

Lemma 5.

Suppose that the problem (14) admits unique solutions $f_{N}$ and $f_{N+1}$ given the respective data $\{(z_{i},y_{i})\}^{N-1}_{i=0}$ and $\{(z_{i},y_{i})\}^{N-1}_{i=0}\cup\{(z_{N},y_{N})\}$ . Then the following holds:

1.

If $\Sigma_{N}(z_{N})=0$ , then $y_{N}-f_{N}\left(z_{N}\right)=0$ .

If $\Sigma_{N}(z_{N})\succ 0$ , we have

\displaystyle\begin{aligned} &\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\\ &\qquad=\left\|\Sigma_{N}(z_{N})^{-1/2}\left(y_{N}-f_{N}\left(z_{N}\right)\right)\right\|_{{\cal Y}}^{2}.\end{aligned}

(19)

Lemma 5 characterizes the error incurred when we use $\widehat{y}_{N}=f_{N}(z_{N})$ as a predictor of $y_{N}=f_{N+1}(z_{N})$ , where $f_{N}$ is the solution to the minimum norm interpolation problem on $\{(z_{i},y_{i})\}^{N-1}_{i=0}$ . In particular, the prediction is exact when $\Sigma_{N}(z_{N})=0$ . If $\Sigma_{N}(z_{N})$ is nonzero but still positive definite, the identity (19) expresses the squared norm of the weighted prediction error $\Sigma_{N}(z_{N})^{-1/2}(y_{N}-\widehat{y}_{N})$ in terms of the norms of $f_{N+1}$ and $f_{N}$ .

In a scalar-valued RKHS, $\Sigma_{N}(z)$ is equal to

\displaystyle\begin{split}&{\rm dist}^{2}(\kappa_{\cal Z}(\cdot,z),{\rm span}(\kappa_{\cal Z}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}(\cdot,z_{N-1})))\\ &=\min_{h\in{\rm span}(\kappa_{\cal Z}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}(\cdot,z_{N-1}))}\|\kappa_{\cal Z}(\cdot,z)-h\|_{{\cal H}_{\cal Z}}^{2},\end{split}

(20)

which is the squared distance from $\kappa_{\cal Z}(\cdot,z)$ to the linear span of kernel functions centered at the observed samples [11]. While Lemma 5 applies to any vector-valued RKHS of ${\cal Y}$ -valued functions on ${\cal H}_{\cal Z}$ , a natural choice of the operator-valued kernel could be $\kappa_{\cal Z}(z_{1},z_{2})=\kappa_{\cal Z}^{\rm{s}}(z_{1},z_{2})I_{{\cal Y}}$ , where $\kappa_{\cal Z}^{\rm{s}}$ is a scalar-valued kernel function. Let ${K}_{N}^{\rm{s}}=[\kappa_{\cal Z}^{s}(z_{i},z_{j})]^{N-1}_{i,j=0}\in\mathbb{R}^{N\times N}$ be the Gram matrix defined by the scalar-valued kernel $\kappa_{\cal Z}^{\rm{s}}$ on the points $z_{0},\dots,z_{N-1}$ , let $k^{\rm{s}}_{N}(z)\in\mathbb{R}^{1\times N}$ be the row vector with $[k_{N}^{\rm{s}}(z)]_{j}=\kappa_{\cal Z}^{\rm{s}}(z,z_{j})$ , ${K}_{N}={K}^{\rm{s}}_{N}\otimes I_{{\cal Y}}$ , and $K_{N}(z_{N})=k^{\rm{s}}_{N-1}(z)\otimes I_{{\cal Y}}$ . Finally, define

\displaystyle s_{N}:={\rm dist}\left\{{\rm span}\left(\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{0}),\cdots,\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{N-1})\right),\kappa_{\cal Z}^{\rm{s}}(\cdot,z_{N})\right\},

where the distance is computed in the scalar-valued RKHS induced by $\kappa_{\cal Z}^{\rm{s}}$ , cf. (20). Then, using the properties of the Kronecker product $\otimes$ , we can compute $\Sigma_{N}(z_{N})$ as follows:

	$\displaystyle\Sigma_{N}(z_{N})$
	$\displaystyle=\kappa_{\cal Z}(z_{N},z_{N})-K_{N}(z_{N})K_{N}^{\dagger}K_{N}(z_{N})^{*}$
	$\displaystyle=\kappa_{\cal Z}^{\rm{s}}(z_{N},z_{N})I_{{\cal Y}}$
	$\displaystyle\,\,-\left(k^{\rm{s}}_{N-1}(z_{N})\otimes I_{{\cal Y}}\right)^{*}\left({K}^{\rm{s}}_{N-1}\otimes I_{{\cal Y}}\right)^{\dagger}\left(k^{\rm{s}}_{N-1}(z_{N})\otimes I_{{\cal Y}}\right)$
	$\displaystyle=\left(\kappa_{\cal Z}^{\rm{s}}(z_{N},z_{N})-k^{\rm{s}}_{N-1}(z_{N})^{\hbox{\it\tiny T}}({{K}^{\rm{s}}_{N-1}})^{\dagger}k^{\rm{s}}_{N-1}(z_{N})\right)I_{{\cal Y}}$
	$\displaystyle=s_{N}^{2}I_{{\cal Y}}.$

Hence, the equality in Lemma 5 reduces to

\displaystyle\begin{aligned} s_{N}^{2}\left(\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\right)=\left\|y_{N}-f_{N}\left(z_{N}\right)\right\|_{{\cal Y}}^{2}.\end{aligned}

This indicates that, given a new $(z_{N},y_{N})$ , if $s_{N}=0$ , we have $\left\|y_{N}-f_{N}(z_{N})\right\|_{2}^{2}=0$ . Given Lemmas 4 and 5, the following is immediate:

Theorem 6 (Behavior Representer Theorem).

Let $\left(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}}\right)_{t=0}^{T-1}$ be a length- $T$ trajectory of regression and output vectors for an unknown $f_{\star}\in{\cal H}_{\cal Z}$ , i.e.,

\displaystyle y^{\rm{d}}_{t^{+}}=f_{\star}(z^{\rm{d}}_{t})\qquad\text{for each }t=0,\dots,T-1.

Let $[u_{0:L}^{\hbox{\it\tiny T}},y_{0:L}^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}$ be an element of ${\cal B}(f_{\star})|_{L}$ , and let $(z_{0},y_{0^{+}})$ be computed according to (7). Then the following holds:

1.

If $\Sigma_{T}(z_{0})=0$ , then $y_{0^{+}}=k_{T}(z_{0})K_{T}^{\dagger}Y_{T}$ , where $K_{T}$ and $Y_{T}$ are computed from the data according to (17) and $Y_{T}:=[(y^{\rm{d}}_{0^{+}})^{\hbox{\it\tiny T}},\cdots,(y^{\rm{d}}_{(T-1)^{+}})^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}$ .

If $\Sigma_{T}(z_{0})\succ 0$ , then

\displaystyle\begin{split}&\|\Sigma_{T}(z_{0})^{-1/2}(y_{0^{+}}-k_{T}(z_{0})K_{T}^{\dagger}Y_{T})\|^{2}_{\cal Y}\\ &\qquad\qquad=\|f_{T+1}\|^{2}_{{\cal H}_{\cal Z}}-\|f_{T}\|^{2}_{{\cal H}_{\cal Z}}\\ &\qquad\qquad\leq\|f_{\star}\|^{2}_{{\cal H}_{\cal Z}}-\|f_{T}\|^{2}_{{\cal H}_{\cal Z}},\end{split}

(21)

where $f_{T}$ (respectively, $f_{T+1})$ is the minimum-norm interpolator of $\{(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})\}^{T-1}_{t=0}$ (respectively, of $\{(z^{\rm{d}}_{t},y^{\rm{d}}_{t^{+}})\}^{T-1}_{t=0}\cup\{(z_{0},y_{0^{+}})\}$ .

The condition $\Sigma_{T}(z_{0})=0$ describes the setting when exact reconstruction is possible. By Lemma 4, this will be the case for all pairs $(z,y_{+}=f_{\star}(z))$ for which ${\rm range}(\kappa_{\cal Z}(\cdot,z))\subseteq{\rm range}(\Phi_{T})\}$ . If $z_{0}$ does not satisfy this condition but $\Sigma_{T}(z_{0})\succ 0$ , we can quantify the reconstruction error using (21). For LTI systems, a sufficient condition for the existence of unique minimum-norm interpolating solutions is

\displaystyle\sum^{T-1}_{t=0}z^{\rm{d}}_{t}(z^{\rm{d}}_{t})^{\hbox{\it\tiny T}}\succ 0,

which is the classical persistence of excitation condition [17, 9]. Exact reconstruction is possible when $z_{0}\in{\rm span}(z^{\rm{d}}_{0},\dots,z^{\rm{d}}_{T-1})$ . In the nonlinear setting, the condition $\Sigma_{T}(z_{0})=0$ plays an analogous role via Lemma 4.

It is useful to compare Theorem 6 with the fundamental lemma for LTI systems (cf. Section II). The latter characterizes behaviors through the image of the Hankel matrix built from observed input-output trajectories. Theorem 6 of this section is conceptually analogous, but is phrased in terms of a different data representation based on regression vectors, which are formed from inputs and past outputs.

V Subspace Identification in the Vector-Valued RKHS Setting

We now revisit systems that arise from state-space representations, as in Section III-C3. We will focus on a particular class of such systems, namely ones that can be represented as

\displaystyle\begin{aligned} x_{t+1}&=Ax_{t}+B\phi\left(u_{t}\right),\\ y_{t}&=Cx_{t}+D\phi\left(u_{t}\right),\end{aligned}

(22)

where $\phi:{\cal U}\to\mathbb{R}^{q}$ is a mapping from the input space ${\cal U}=\mathbb{R}^{m}$ into $\mathbb{R}^{q}$ .

Given $(A,B,C,D)$ , we construct the $L$ -step controllability matrix ${\cal C}_{L}$ , the $L$ -step observability matrix ${\cal O}_{L}$ , the reversed $L$ -step controllability matrix $\Delta_{L}$ , and the $L$ -step modified Toeplitz matrix $\widetilde{{\cal T}}_{L}$ exactly as in (11). The $L$ -step Toeplitz matrix is given by

\displaystyle{\cal T}_{L}:=\widetilde{{\cal T}}_{L}+I_{L}\otimes D.

Let $\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1}$ denote input/output data of length $T$ collected from measurements of (22) starting from some initial condition $x_{0}\in\mathbb{R}^{n}$ . In the LTI case (i.e., when $q=m$ and $\phi$ is the identity map), subspace identification methods [21] allow one to reconstruct, under certain regularity conditions, the state trajectory $x_{L},\dots,x_{T-L}$ and the observability matrix ${\cal O}_{L}$ directly from the input/output data without knowledge of the system matrices $A,B,C,D$ .

In this section, we show that these methods can be extended to the set-up of (22) when $\phi$ is an element of a suitable vector-valued RKHS on ${\cal U}$ . Moreover, we obtain a result in the spirit of the fundamental lemma for LTI systems, namely that the set of all valid length- $L$ input/output trajectories of (22) can be reconstructed directly from the data $\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1}$ without an intermediate system identification step.

V-A The construction of a vector-valued RKHS

There are various ways of instantiating the state-space model (22) in the vector-valued RKHS framework presented in Section III-C, i.e., choosing a suitable $\mathbb{R}^{q}$ -valued RKHS ${\cal H}_{\kappa}$ on the input space ${\cal U}$ with reproducing kernel $\kappa$ so that $\phi\in{\cal H}_{\kappa}$ . Perhaps the simplest one is to define the operator-valued map $\kappa:{\cal U}\times{\cal U}\to{\cal L}(\mathbb{R}^{q})$ by

\displaystyle\kappa(u,u^{\prime}):=\phi(u)\otimes\phi(u^{\prime}),

which is readily seen to be of the positive type since, for any $n$ , any $\alpha_{1},\dots,\alpha_{n}\in\mathbb{R}$ , any $u_{1},\dots,u_{n}\in{\cal U}$ , and any $v\in\mathbb{R}^{q}$ ,

\displaystyle\begin{aligned} &\sum^{n}_{i,j=1}\alpha_{i}\alpha_{j}\langle\kappa(u_{i},u_{j})v,v\rangle_{\mathbb{R}^{q}}\\ &=\sum^{n}_{i,j=1}\alpha_{i}\alpha_{j}\langle\phi(u_{i}),v\rangle_{\mathbb{R}^{q}}\langle\phi(u_{j}),v\rangle_{\mathbb{R}^{q}}\\ &=\left(\sum^{n}_{i=1}\alpha_{i}\langle\phi(u_{i}),v\rangle_{\mathbb{R}^{q}}\right)^{2}\\ &\geq 0.\end{aligned}

By [5, Prop. 2.3], there is a unique $\mathbb{R}^{q}$ -valued RKHS ${\cal H}_{\kappa}$ of functions on ${\cal U}$ with reproducing kernel $\kappa$ . The kernel $\kappa$ has the convenient property

\displaystyle\left\langle\phi(u),\phi(u^{\prime})\right\rangle_{\mathbb{R}^{q}}={\textrm{Tr}}\left(\kappa(u,{u^{\prime}})\right).

(23)

V-B Subspace identification using RKHS methods

We now have all the ingredients in place for putting together a subspace identification framework analogous to the one for LTI systems [21].

Let the input/output data $\left\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t}\right)\}_{t=0}^{T-1}$ be given. Introduce the Hankel-type block matrices

\displaystyle\begin{aligned} \begin{bmatrix}{Y_{{\rm{p}}}}\\ \hline\cr{Y_{{\rm{f}}}}\end{bmatrix}:=\begin{bmatrix}y^{\rm{d}}_{0}&\cdots&y^{\rm{d}}_{T-2L}\\ \vdots&&\vdots\\ y^{\rm{d}}_{L-1}&\cdots&y^{\rm{d}}_{T-L-1}\\ \hline\cr y^{\rm{d}}_{L}&\cdots&y^{\rm{d}}_{T-L}\\ \vdots&&\vdots\\ y^{\rm{d}}_{2L-1}&\cdots&y^{\rm{d}}_{T-1}\end{bmatrix}.\end{aligned}

(24)

and

\displaystyle\begin{aligned} \begin{bmatrix}U_{{\rm{p}}}^{\phi}\\ \hline\cr U_{{\rm{f}}}^{\phi}\end{bmatrix}:=\begin{bmatrix}\phi\left(u^{\rm{d}}_{0}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-2L}\right)\\ \vdots&&\vdots&\\ \phi\left(u^{\rm{d}}_{L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-L-1}\right)\\ \hline\cr\phi\left(u^{\rm{d}}_{L}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-L}\right)\\ \vdots&&\vdots&\\ \phi\left(u^{\rm{d}}_{2L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-1}\right)\end{bmatrix},\end{aligned}

(25)

where ${\rm{p}}$ and ${\rm{f}}$ designates a partition into “past” and “future” data. Let $H_{{\rm{p}}}$ and $H_{{\rm{f}}}$ denote, respectively, the concatenations of $U_{{\rm{p}}}$ and $Y_{{\rm{p}}}$ and of $U_{{\rm{f}}}$ and $Y_{{\rm{f}}}$ :

\displaystyle H_{{\rm{p}}}:=\begin{bmatrix}U_{{\rm{p}}}\\ Y_{{\rm{p}}}\end{bmatrix},\quad H_{{\rm{f}}}:=\begin{bmatrix}U_{{\rm{f}}}\\ Y_{{\rm{f}}}\end{bmatrix}.

Similar to the linear case, let $\Pi$ denote the oblique projection of the rowspace of $Y_{{\rm{f}}}$ onto the rowspace of $H_{{\rm{p}}}$ along the rowspace of $U_{{\rm{f}}}$ :

\displaystyle\Pi:={Y_{{\rm{f}}}}\underset{U_{{\rm{f}}}^{\phi}}{/}H_{p},

which satisfies ${\rm rank}(\Pi)\leq T-2L+1$ . Let the SVD of $\Pi$ be given by

\displaystyle\Pi=\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}\begin{bmatrix}\Sigma_{1}&0\\ 0&0\end{bmatrix}\begin{bmatrix}V_{1}^{{\hbox{\it\tiny T}}}\\ V_{2}^{{\hbox{\it\tiny T}}}\end{bmatrix}=U_{1}\Sigma_{1}V_{1}^{{\hbox{\it\tiny T}}}.

Following [21], let $X_{\rm{f}}:=\begin{bmatrix}x_{L}&x_{L+1}&\dots&x_{T-L}\end{bmatrix}$ , where $x_{t}$ is the state trajectory of (22) determined by the initial state $x_{0}$ and the inputs $u^{\rm{d}}_{t}$ . In the LTI case, $X_{\rm{f}}$ can be recovered, up to a similarity transformation, via the formula $X_{{\rm{f}}}=\Sigma_{1}^{1/2}V_{1}^{\hbox{\it\tiny T}}$ (see, e.g., [21, Theorem 2, Chapter 2]). In fact, the same result holds in the present nonlinear case (the idea is to view $v_{k}:=\phi(u_{k})$ as an input to an LTI system given by $(A,B,C,D)$ ). Given the input/output data, let

\displaystyle U^{\phi}:=\begin{bmatrix}\phi\left(u^{\rm{d}}_{0}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-2L-n}\right)\\ \vdots&&\vdots\\ \phi\left(u^{\rm{d}}_{2L-1}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-n}\right)\\ \hline\cr\phi\left(u^{\rm{d}}_{2L}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-n}\right)\\ \vdots&&\vdots\\ \phi\left(u^{\rm{d}}_{2L-1+n}\right)&\cdots&\phi\left(u^{\rm{d}}_{T-1}\right)\end{bmatrix},

(26)

and form the input Gram matrix of depth $2L+n$ as $K^{u}:=(U^{\phi})^{{\hbox{\it\tiny T}}}U^{\phi}$ with entries $[K^{u}]_{ij}=\sum_{k=0}^{2L+n-1}\kappa\left(u^{\rm{d}}_{i+k},u^{\rm{d}}_{j+k}\right)$ . The following theorem is a straightforward adaptation of subspace identification results for LTI systems:

Theorem 7.

Let $\left\{u^{\rm{d}}_{i},y^{\rm{d}}_{i}\right\}_{i=0}^{T-1}$ be a sequence of length $T$ generated by the system (22), where $(A,B)$ is controllable and $(A,C)$ is observable. Suppose that the input sequence is such that ${\rm rank}(K^{u})=(2L+n)q$ . Then, $\Pi={\cal O}_{L}X_{{\rm{f}}}$ and $X_{{\rm{f}}}=\Sigma^{1/2}_{1}V_{1}^{\hbox{\it\tiny T}}$ .

The process of computing the oblique projection and constructing the state vector can be carried out using Gram matrices computed from pairwise kernel evaluations $\kappa$ , without explicitly using $\phi$ . Specifically, define the Gram matrices $K_{\rm{p}}^{u}=\left({U_{{\rm{p}}}^{\phi}}\right)^{{\hbox{\it\tiny T}}}{U_{{\rm{p}}}^{\phi}}$ , $K_{\rm{f}}^{u}=\left({U_{{\rm{f}}}^{\phi}}\right)^{{\hbox{\it\tiny T}}}{U_{{\rm{f}}}^{\phi}}$ , and $K_{\rm{p}}^{y}={Y_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}{Y_{{\rm{p}}}}$ , whose entries are

\displaystyle\begin{aligned} {}_{ij}=&\sum_{k=0}^{L-1}\left\langle\phi(u^{\rm{d}}_{i+k}),\phi(u^{\rm{d}}_{j+k})\right\rangle_{\mathbb{R}^{M}}\\ \stackrel{{\scriptstyle(a)}}{{=}}&\sum_{k=0}^{L-1}{\textrm{Tr}}\left(\kappa\left(u^{\rm{d}}_{i+k},u^{\rm{d}}_{j+k}\right)\right),\\ [K_{\rm{f}}^{u}]_{ij}=&\sum_{k=0}^{L-1}\left\langle\phi(u^{\rm{d}}_{i+L+k}),\phi(u^{\rm{d}}_{j+L+k})\right\rangle_{\mathbb{R}^{M}}\\ \stackrel{{\scriptstyle(b)}}{{=}}&\sum_{k=0}^{L-1}{\textrm{Tr}}\left(\kappa\left(u^{\rm{d}}_{i+L+k},u^{\rm{d}}_{j+L+k}\right)\right),\\ [K_{\rm{p}}^{y}]_{ij}=&\sum_{k=0}^{L-1}{y^{\rm{d}}_{i+k}}^{\hbox{\it\tiny T}}y^{\rm{d}}_{j+k},\end{aligned}

respectively, where (a) and (b) follow from (23).

The oblique projection can be computed as

\displaystyle\begin{aligned} \Pi=&{Y_{{\rm{f}}}}\underset{{U_{{\rm{f}}}^{\phi}}}{/}H_{p}\\ =&{Y_{{\rm{f}}}}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\left(\begin{bmatrix}{H_{{\rm{p}}}}\\ {{U_{{\rm{f}}}^{\phi}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\right)^{\dagger}\begin{bmatrix}{H_{{\rm{p}}}}\\ {0}\end{bmatrix}\\ =&{Y_{{\rm{f}}}}\left(\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}\\ {{U_{{\rm{f}}}^{\phi}}}\end{bmatrix}\right)^{\dagger}\begin{bmatrix}{H_{{\rm{p}}}}^{{\hbox{\it\tiny T}}}&({U_{{\rm{f}}}^{\phi}})^{{\hbox{\it\tiny T}}}\end{bmatrix}\begin{bmatrix}{H_{{\rm{p}}}}\\ {0}\end{bmatrix}\\ =&{Y_{{\rm{f}}}}\left(K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}+K_{{\rm{f}}}^{u}\right)^{\dagger}\left(K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}\right),\\ \end{aligned}

(27)

where the second-to-last line follows from the identity $x(x^{{\hbox{\it\tiny T}}}x)^{\dagger}=(xx^{{\hbox{\it\tiny T}}})^{\dagger}x$ .

To construct the state vector, instead of directly performing SVD on $\Pi$ , we construct $U$ , $\Sigma$ , and $V$ such that $\Pi=U\Sigma V^{{\hbox{\it\tiny T}}}$ via eigendecomposition of $\Pi^{{\hbox{\it\tiny T}}}\Pi$ and $\Pi\Pi^{{\hbox{\it\tiny T}}}$ . Denote $\overline{K}_{{\rm{p}},{\rm{f}}}:=K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}+K_{{\rm{f}}}^{u}$ and $\overline{K}_{{\rm{p}}}:=K_{{\rm{p}}}^{u}+K_{{\rm{p}}}^{y}$ and notice that we have

\displaystyle\begin{aligned} \Pi^{{\hbox{\it\tiny T}}}\Pi=&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left(\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\right)^{{\hbox{\it\tiny T}}}\left({{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}{{Y_{{\rm{f}}}}}\right)\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\\ =&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left(\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\\ =&\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}K_{{\rm{f}}}^{y}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}.\end{aligned}

Then, we can obtain $\left(\sigma_{i}^{1/2},v_{i}\right)$ as the eigenvalues and right eigenvectors of $\Pi^{{\hbox{\it\tiny T}}}\Pi$ . Likewise, we have

\displaystyle\begin{aligned} \Pi\Pi^{{\hbox{\it\tiny T}}}=&{Y_{{\rm{f}}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}.\end{aligned}

Denote $\Gamma:=\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\overline{K}_{{\rm{p}},{\rm{f}}}^{\dagger}$ . Then, we have

\displaystyle{Y_{{\rm{f}}}}\Gamma{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}\left({Y_{{\rm{f}}}}z_{i}\right)=\sigma^{1/2}_{i}\left({Y_{{\rm{f}}}}z_{i}\right)\Leftrightarrow\Gamma\underbrace{{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}{Y_{{\rm{f}}}}}_{=:K_{{\rm{f}}}^{y}}\xi_{i}=\sigma_{i}^{1/2}\xi_{i}.

That is, $\Pi\Pi^{{\hbox{\it\tiny T}}}={{Y_{{\rm{f}}}}}\Gamma{{Y_{{\rm{f}}}}}^{{\hbox{\it\tiny T}}}$ has an eigenvector $u_{i}={Y_{{\rm{f}}}}\xi_{i}$ associated with eigenvalue $\sigma^{1/2}_{i}$ iff $\xi_{i}$ is an eigenvector of $\Gamma K_{{\rm{f}}}^{y}$ corresponding to the same eigenvalue. With $U=[u_{1},\cdots,u_{n}]$ , $\Sigma=\text{diag}(\sigma_{1},\cdots,\sigma_{n})$ , and $V=[v_{1},\cdots,v_{n}]$ , we can define the extended observability matrix ${\cal O}_{L}$ and states $X_{f}$ (up to a similarity transform) as

\displaystyle{\cal O}_{L}=U\Sigma^{1/2},\quad X_{{\rm{f}}}=V\Sigma^{1/2}.

The following corollary is immediate from the above process.

Corollary 8 (Construction of state vectors).

Suppose that $(A,B)$ is controllable and $(A,C)$ is observable, i.e., ${\rm rank}({\cal O}_{L})={\rm rank}({\cal C}_{L})=n$ . Let input-output data $\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t})\}_{t=0}^{T-1}$ of length $T$ be given. Suppose the input sequence is such that ${\rm rank}(K_{2L+n}^{u})=(2L+n)q$ . Let $(\Sigma^{1/2},V)$ be the eigenpair of $\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left({\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}{\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\overline{K}_{{\rm{p}}}$ , and $(\Sigma^{1/2},\Xi)$ be the eigenpair of ${\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\overline{K}_{{\rm{p}}}\overline{K}_{{\rm{p}}}^{{\hbox{\it\tiny T}}}\left({\overline{K}_{{\rm{p}},{\rm{f}}}}^{-1}\right)^{{\hbox{\it\tiny T}}}K_{{\rm{f}}}^{y}$ . Define $U={Y_{{\rm{f}}}}\Xi$ . Then the extended observability matrix is ${{\cal O}}_{L}=U\Sigma^{1/2}$ , and the state sequence is $X_{f}=V\Sigma^{1/2}$ (both up to a similarity transformation).

The next result explicitly connects subspace identification to an RKHS version of the fundamental lemma for nonlinear systems admitting state-space realization (22).

Theorem 9.

Consider the state-space model (22), where $(A,B)$ is controllable and $(A,C)$ is observable. Let input-output data $\{(u^{\rm{d}}_{t},y^{\rm{d}}_{t})\}_{t=0}^{T-1}$ of length $T$ be given, and suppose the input Gram matrix is such that ${\rm rank}(K_{2L+n}^{u})=(2L+n)q$ and that $L$ is chosen so that ${\cal O}_{L}$ has full column rank. Then a length- $2L$ sequence $\begin{bmatrix}u_{0:2L-1}\\ y_{0:2L-1}\end{bmatrix}$ is a valid input/output trajectory of the system (22) if and only if there exists $\xi\in\mathbb{R}^{T-2L+1}$ such that

\displaystyle\begin{bmatrix}k_{{\rm{p}}}^{u}\\ k_{{\rm{p}}}^{y}\end{bmatrix}=\begin{bmatrix}K_{{\rm{p}}}^{u}\\ K_{{\rm{p}}}^{y}\end{bmatrix}\xi\text{ and }\begin{bmatrix}k_{{\rm{f}}}^{u}\\ k_{{\rm{f}}}^{y}\end{bmatrix}=\begin{bmatrix}K_{{\rm{f}}}^{u}\\ K_{{\rm{f}}}^{y}\end{bmatrix}\xi,

where, for $\Phi_{0:2L-1}:=[\phi(u_{0})^{\hbox{\it\tiny T}},\dots,\phi(u_{2L-1})^{\hbox{\it\tiny T}}]^{\hbox{\it\tiny T}}$ , the vectors $k_{{\rm{p}}}^{u}=(U_{{\rm{p}}}^{\phi})^{{\hbox{\it\tiny T}}}\Phi_{0:2L-1}$ and $k_{{\rm{f}}}^{u}=(U_{{\rm{f}}}^{\phi})^{{\hbox{\it\tiny T}}}\Phi_{0:2L-1}$ have entries given by $[k_{{\rm{p}}}^{u}]_{j}=\sum_{r=0}^{L-1}\kappa(u^{\rm{d}}_{j+r},u_{r})$ , $[k_{{\rm{f}}}^{u}]_{j}=\sum_{r=0}^{L-1}\kappa(u^{\rm{d}}_{j+L+r},u_{L+r})$ ; and $k_{{\rm{p}}}^{y}=Y_{\rm{p}}^{{\hbox{\it\tiny T}}}{y}_{0:L-1}$ , $k_{{\rm{f}}}^{y}=Y_{\rm{f}}^{{\hbox{\it\tiny T}}}{y}_{L:2L-1}$ are the Euclidean inner products of outputs.

In the context of the above theorem, we can view $(K_{{\rm{p}}}^{u},K_{{\rm{p}}}^{y},K_{{\rm{f}}}^{u},K_{{\rm{f}}}^{y})$ as kernel matrices of the “offline” training data generated by (22). By splitting the kernel vector of the “online” testing sequence $\Phi_{0:2L-1}$ into two parts (past and future), we test whether one can interpolate the kernel matrices of past and future using the same coefficients, where the initial length- $L$ segment of the testing sequence specifies the initial condition for the subsequent length- $L$ segment. A similar observation for LTI systems on the specification of initial condition has been given in [13, Proposition 1]. In the proof of Theorem 9, we further show that the trajectory passes through a state $\widehat{x}_{L}$ at time $L$ that is a linear combination of states from the training data, i.e., $\widehat{x}_{L}=X_{\rm{f}}\xi$ for some $\xi$ .

In light of Theorem 9, we can compute the predictor $\widehat{y}_{L:2L-1}$ via

\displaystyle\widehat{y}_{L:2L-1}=(Y_{\rm{f}}^{{\hbox{\it\tiny T}}})^{\dagger}k_{{\rm{f}}}^{y}=(Y_{\rm{f}}^{{\hbox{\it\tiny T}}})^{\dagger}{K_{{\rm{f}}}^{y}}\xi,

where $\xi$ is the solution of the following equation,

\displaystyle\begin{bmatrix}k_{{\rm{p}}}^{u}\\ k_{{\rm{p}}}^{y}\\ k_{{\rm{f}}}^{u}\end{bmatrix}=\begin{bmatrix}{K_{{\rm{p}}}^{u}}\\ {K_{{\rm{p}}}^{y}}\\ {K_{{\rm{f}}}^{u}}\\ \end{bmatrix}\xi.

The interpolation-based method in Section IV characterizes the future output ${Y_{{\rm{f}}}}$ using kernelized offline data to make predictions regarding the online data. On the other hand, the realization-based method aims to capture the output $y_{t+L-1}$ as linear functions of states $X_{{\rm{f}}}$ and kernelized input vectors $\phi(u_{t}),\dots,\phi(u_{t+L-1})$ . The preference between the two approaches depends on the memory length $L$ and the state dimension $n$ . E.g., in the regime $L\gg n$ , the states serve as an efficient representation of the system’s history.

VI Conclusion

In this paper, we have put forward a behavioral framework for modeling a class of nonlinear systems in a vector-valued RKHS. This formulation is rich enough to cover LTI systems as well as nonlinear systems modeled by Volterra series, autoregressive models based on Volterra series, and Hammerstein-type state-space models. Using this framework, we have analyzed two methods for data-driven modeling of such systems, minimum-norm interpolation and subspace identification. We have clarified the role of various structural assumptions on the system, the data (both offline and online), and the vector-valued RKHS that represents the nonlinear aspects of the system. More broadly, this work expands the scope of behavioral systems theory to nonlinear systems. In doing so, it reinforces the conceptual shift underlying data-driven control: rather than aiming to recover the state evolution, one can directly actualize the desired behaviors using observed trajectories.

Appendix A Proof of Lemma 2

Let $\|\cdot\|_{\cal V}$ and $\|\cdot\|_{\cal W}$ be the Hilbert-space norms on ${\cal V}$ and ${\cal W}$ induced by their respective inner products. Consider the evaluation functional $\delta_{v}:{\cal L}({\cal V},{\cal W})\to{\cal W}$ given by $\delta_{v}(A)=Av$ . It is bounded since

\displaystyle\begin{aligned} \left\|\delta_{v}(A)\right\|_{{\cal W}}=&\left\|Av\right\|_{{\cal W}}\\ \leq&\left\|A\right\|_{{\cal L}({\cal V},{\cal W})}\left\|v\right\|_{{\cal V}}\\ \leq&\left\|A\right\|_{{\cal H}}\left\|v\right\|_{{\cal V}},\end{aligned}

where the last step follows from the relation

	$\displaystyle\left\\|A\right\\|_{{\cal L}({\cal V},{\cal W})}$	$\displaystyle=\sup_{v\in{\cal V},\,\left\\|v\right\\|_{\cal V}=1}\sqrt{\langle Av,Av\rangle_{\cal W}}$
		$\displaystyle=\sup_{v\in{\cal V},\,\left\\|v\right\\|_{\cal V}=1}\sqrt{\langle v,A^{*}Av\rangle_{\cal V}}$
		$\displaystyle\leq\\|A\\|_{\cal H}.$

Hence, ${\cal H}$ is an RKHS. We next show that $\kappa\left(v,v^{\prime}\right)=\langle v,v^{\prime}\rangle_{{\cal V}}I_{{\cal W}}$ is the reproducing kernel. It is obvious that $\kappa$ is of the positive type. To show it satisfies the reproducing property, notice that for any $v\in{\cal V}$ and $w\in{\cal W}$ we have

\displaystyle\begin{aligned} \left\langle Av,w\right\rangle_{{\cal W}}={\textrm{Tr}}\left(A^{*}(w\otimes v)\right)\\ =\left\langle A,w\otimes v\right\rangle_{{\cal H}},\end{aligned}

where $w\otimes v\in{\cal H}$ is the rank-one linear operator $(w\otimes v)v^{\prime}=\langle v,v^{\prime}\rangle_{\cal V}w$ .

On the other hand, for $w\in{\cal W}$ , we have

\displaystyle\begin{aligned} \kappa\left(v^{\prime},v\right)w=\langle v^{\prime},v\rangle_{\cal V}w=(w\otimes v^{\prime})v.\end{aligned}

Hence, we have

\displaystyle\left\langle A,\kappa(\cdot,v)w\right\rangle_{{\cal H}}=\left\langle Av,w\right\rangle_{{\cal W}}.

That is, $\kappa$ satisfies the reproducing property.

Appendix B Proofs For Results in Section IV

B-A Proof of Theorem 3

Let $c_{0},\cdots,c_{T-1}\in\mathbb{R}$ be given. For each $j=0,\cdots,{T-1}$ , the relation

\displaystyle\langle y_{j^{+}}^{\rm{d}},v\rangle_{{\cal Y}}=\langle f,\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})v\rangle_{{\cal H}_{\cal Z}}

holds for all $v\in{\cal Y}$ . Multiplying both sides by $c_{j}$ and summing over $j=0,\cdots,{T-1}$ , we have

\displaystyle\begin{aligned} \left\langle\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}},v\right\rangle_{{\cal Y}}=&\left\langle f,\sum_{j=0}^{T-1}c_{j}\kappa_{\cal Z}(\cdot,z_{j}^{\rm{d}})v\right\rangle_{{\cal H}_{\cal Z}}\\ =&\left\langle\sum_{j=0}^{T-1}c_{j}f(z_{j}^{\rm{d}}),v\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

for all $v\in{\cal Y}$ , where the last line follows from the reproducing property. Hence, the pair $\left(\sum_{j=0}^{T-1}c_{j}\kappa_{Z}(\cdot,z_{j}^{\rm{d}}),\sum_{j=0}^{T-1}c_{j}y_{j^{+}}^{\rm{d}}\right)$ satisfies Eq. (3) as claimed.

B-B Proof of Lemma 4

Define the following subspace ${\cal H}_{N}$ of ${\cal H}_{\cal Z}$ :

\displaystyle\begin{aligned} {\cal H}_{N}&:={\rm span}\left\{\kappa_{\cal Z}(\cdot,z_{i})v,\ i=0,\cdots,N-1,\ v\in{\cal Y}\right\}\\ &={{\rm range}(\Phi_{N})}.\end{aligned}

(28)

Let $\Pi_{N}$ denote the orthogonal projection onto ${\cal H}_{N}$ and notice that $\Pi_{N}=\Phi_{N}\left(\Phi_{N}^{*}\Phi_{N}\right)^{\dagger}\Phi_{N}^{*}$ . Using this in the definition of $\Sigma_{N}(z)$ in (18), we get

\displaystyle\begin{aligned} \Sigma_{N}(z)&=\kappa_{\cal Z}(z,z)-k_{N}(z)K_{N}^{\dagger}{k}_{N}(z)^{*}\\ &=\kappa_{\cal Z}(\cdot,z)^{*}\kappa_{\cal Z}(\cdot,z)\\ &\quad-\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\left(\Phi_{N}^{*}\Phi_{N}\right)^{\dagger}\left(\kappa_{\cal Z}(\cdot,z)^{*}\Phi_{N}\right)^{*}\\ &=\kappa_{\cal Z}(\cdot,z)^{*}\big(I-\Phi_{N}(\Phi_{N}^{*}\Phi_{N})^{\dagger}\Phi_{N}^{*}\big)\kappa_{\cal Z}(\cdot,z)\\ &=\kappa_{\cal Z}(\cdot,z)^{*}(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z).\end{aligned}

Hence, for any $v\in\mathbb{R}^{p}$ ,

\displaystyle\begin{aligned} v^{\hbox{\it\tiny T}}\Sigma_{N}(z)v=&v^{\hbox{\it\tiny T}}\kappa_{\cal Z}(\cdot,z)^{*}(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v\\ =&\left\|(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v\right\|_{{\cal H}_{\cal Z}}.\end{aligned}

Hence, $\Sigma_{N}(z)$ is positive semi-definite. In particular, $\Sigma_{N}(z)=0$ iff $(I-\Pi_{N})\kappa_{\cal Z}(\cdot,z)v=0$ for all $v\in\mathbb{R}^{p}$ , that is, if $\kappa_{\cal Z}(\cdot,z)v\in{\cal H}_{N}={\rm range}(\Phi_{N})$ .

B-C Proof of Lemma 5

Since $f_{N+1}$ interpolates $\{(z_{i},y_{i})\}^{N-1}_{i=0}$ , we have for all $v\in{\cal Y}$ and $i=0,\cdots,N-1$ ,

\displaystyle\begin{aligned} \left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}=\left\langle f_{N+1}(z_{i}),v\right\rangle_{{\cal Y}}=\left\langle y_{i},v\right\rangle_{{\cal Y}}.\end{aligned}

On the other hand, for $i=0,\cdots,N-1$ and all $v\in{\cal Y}$ ,

\displaystyle\begin{aligned} &\left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}[f_{N+1}]+\Pi_{N}^{\perp}[f_{N+1}],\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}[f_{N+1}],\kappa_{\cal Z}(\cdot,z_{i})v\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

where $\Pi^{\perp}_{N}=I-\Pi_{N}$ and $\Pi_{N}$ is the orthogonal projection onto ${\cal H}_{N}$ . Since $v$ is arbitrary, $\Pi_{N}[f_{N+1}](z_{i})=y_{i}$ for $i=0,\cdots,N-1$ , i.e., both $f_{N}$ and $\Pi_{N}[f_{N+1}]$ interpolate $\{(z_{i},y_{i})\}_{i=0}^{N-1}$ . Since $f_{N}$ is the unique minimum-norm solution in ${\cal H}_{N}$ , it must be the case that $f_{N}=\Pi_{N}[f_{N+1}]$ . Hence,

\displaystyle f_{N+1}-f_{N}=f_{N+1}-\Pi_{N}[f_{N+1}]\in{\cal H}_{N}^{\perp}.

That is, there exists some $\xi_{N}\in{\cal Y}$ such that

\displaystyle\begin{aligned} f_{N+1}-f_{N}=\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}].\end{aligned}

(29)

We now determine $\xi_{N}$ . By the reproducing property, we have

\displaystyle\begin{aligned} &\left\langle f_{N+1}(z_{N}),v\right\rangle_{{\cal Y}}\\ &=\left\langle f_{N+1},\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle f_{N}+\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{Z}}\\ &=\left\langle f_{N},\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &\quad+\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}.\end{aligned}

(30)

For the last line, applying the reproducing property again, we can write the first term as $\langle f_{N},\kappa_{\cal Z}(\cdot,z_{N})v\rangle_{{\cal H}_{\cal Z}}=\langle f_{N}(z_{N}),v\rangle_{{\cal Y}}$ . For the second term, by orthogonality we have

\displaystyle\begin{aligned} &\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\kappa_{\cal Z}(\cdot,z_{N})v\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{t}],\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\langle\xi_{N},\kappa_{\cal Z}(z_{N},z_{N})v\rangle_{{\cal H}_{\cal Z}}\\ &\quad-\left\langle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}},\end{aligned}

where we have used the fact that, since $\Pi_{N}$ is an orthogonal projection,

\displaystyle\langle\Pi_{N}h,\Pi_{N}h^{\prime}\rangle_{{\cal H}_{\cal Z}}=\langle h,\Pi_{N}h^{\prime}\rangle_{{\cal H}_{\cal Z}}=\langle\Pi_{N}h,h^{\prime}\rangle_{{\cal H}_{\cal Z}}

for all $h,h^{\prime}\in{\cal H}_{\cal Z}$ . Since $\Pi_{N}$ is an orthogonal projection onto ${\cal H}_{N}$ , we have

\displaystyle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}]=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\alpha_{j},

and

\displaystyle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]=\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\beta_{j},

where $\bar{\alpha}=(\alpha_{0},\cdots\alpha_{N-1})\in{\cal Y}^{N-1}$ and $\bar{\beta}=(\beta_{0},\dots,\beta_{N-1})\in{\cal Y}^{N-1}$ are the solutions of $K_{N}\bar{\alpha}=k_{N}(z_{N})\xi_{N}$ and $K_{N}\bar{\beta}=k_{N}(z_{N})v$ . Therefore,

\displaystyle\begin{aligned} &\left\langle\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\alpha_{j},\sum_{j=0}^{N-1}\kappa_{\cal Z}(\cdot,z_{j})\beta_{j}\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\bar{\alpha},K_{N}\bar{\beta}\right\rangle_{{\cal Y}^{\oplus(N-1)}}\\ &=\left\langle K_{N}^{\dagger}k_{N}(z_{N})\xi_{N},K_{N}K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}^{\oplus(N-1)}}\\ &=\left\langle\xi_{N},k^{*}_{N}(z_{N})K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}},\end{aligned}

where the last line holds since $K_{N}^{\dagger}K_{N}K_{N}^{\dagger}=K_{N}^{\dagger}$ .

Putting everything together and using the definition of $\Sigma_{N}(z_{N})$ , we arrive at

\displaystyle\begin{aligned} &\left\langle\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{N}],\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})v]\right\rangle_{{\cal H}_{\cal Z}}\\ &=\left\langle\xi_{N},\kappa_{\cal Z}(z_{N},z_{N})v\right\rangle_{{\cal Y}}-\left\langle\xi_{N},k^{*}_{N}(z_{N})K_{N}^{\dagger}k_{N}(z_{N})v\right\rangle_{{\cal Y}}\\ &=\left\langle\xi_{N},\Sigma_{N}(z_{N})v\right\rangle_{{\cal Y}}.\end{aligned}

Plugging the above relation back into (30) and using the fact that $v\in{\cal Y}$ is arbitrary and that $\Sigma_{N}(z_{N})$ is self-adjoint, we see that $\xi_{N}$ is determined by the relation

\displaystyle f_{N+1}(z_{N})=f_{N}(z_{N})+\Sigma_{N}(z_{N})\xi_{N}.

As $f_{N+1}$ interpolates $\left(z_{N},y_{N^{+}}\right)$ , i.e., $y_{N^{+}}=f_{N+1}(z_{N})$ , we can further write

\displaystyle\Sigma_{N}(z_{N})\xi_{N}=y_{N^{+}}-f_{N}(z_{N}).

(31)

Hence, when $\Sigma_{N}(z_{N})=0$ , we will have $f_{N}(z_{N})=f_{N+1}(z_{N})=y_{N^{+}}$ .

Next, applying the Pythagorean theorem in (29), we have

\displaystyle\begin{aligned} \left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}=&\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}+\left\|\Pi_{N}^{\perp}[\kappa_{\cal Z}(\cdot,z_{N})\xi_{t}]\right\|_{{\cal H}_{\cal Z}}^{2}\\ =&\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}+\left\langle\xi_{N},\Sigma_{N}(z_{N})\xi_{N}\right\rangle_{{\cal Y}},\end{aligned}

where the last line follows from the definition of $\Sigma_{N}$ . When $\Sigma_{N}(z_{N})\succ 0$ , from (31) we have

\displaystyle\begin{aligned} &\left\|f_{N+1}\right\|_{{\cal H}_{\cal Z}}^{2}-\left\|f_{N}\right\|_{{\cal H}_{\cal Z}}^{2}\\ &=\left\langle\Sigma_{N}^{-1}(z_{N})\left(y_{N^{+}}-f_{N}(z_{N})\right),y_{N^{+}}-f_{N}(z_{N})\right\rangle_{{\cal Y}}\\ &=\left\|\Sigma_{N}^{-1/2}(z_{N})\left(y_{N^{+}}-f_{N}(z_{N})\right)\right\|^{2}_{{\cal Y}}.\end{aligned}

This concludes the proof.

Appendix C Proofs For Results in Section V

C-A Proof of Theorem 9

In our model (22), we can view $v=\phi(u)$ as the $q$ -dimensional input to the LTI system parametrized by $(A,B,C,D)$ . Consequently, the argument in the proof in [22, Theorem 1] extends directly to our setting and guarantees that the matrix

\displaystyle{\cal M}:=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {U_{\rm{f}}^{\phi}}\\ X_{0}\end{bmatrix}\in\mathbb{R}^{(2Lq+n)\times(T-2L+1)}

has full row rank.

$(\Rightarrow)$ : Suppose that $\begin{bmatrix}{\Phi}_{0:2L-1}\\ {y}_{0:2L-1}\end{bmatrix}$ is a valid input-output trajectory of (22). Then there exists an initial state $\widehat{x}_{0}$ such that

\displaystyle{y}_{0:L-1}={\cal O}_{L}\widehat{x}_{0}+{\cal T}_{L}{\Phi}_{0:L-1}.

Since ${\cal M}$ has full row rank, for any $\begin{bmatrix}{\Phi}_{0:L-1}\\ {\Phi}_{L:2L-1}\\ \widehat{x}_{0}\end{bmatrix}\in\mathbb{R}^{2Lq}\oplus\mathbb{R}^{n}$ , there exists some $\xi\in\mathbb{R}^{T-2L+1}$ such that

\displaystyle\begin{bmatrix}{\Phi}_{0:L-1}\\ {\Phi}_{L:2L-1}\\ \widehat{x}_{0}\end{bmatrix}={\cal M}\xi.

Thus, we have

\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{\Phi}_{0:L-1}\\ \widehat{x}_{0}\end{bmatrix}\\ =&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ X_{0}\end{bmatrix}\xi\\ =&\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi.\end{aligned}

Moreover, from the dynamics (22), we have

\displaystyle X_{\rm{f}}=A^{L}X_{0}+\Delta_{L}{U_{\rm{p}}^{\phi}}.

(32)

Hence, at time $t=L$ ,

\displaystyle\begin{aligned} \widehat{x}_{L}=&A^{L}\widehat{x}_{0}+\Delta_{L}{\Phi}_{0:L-1}\\ =&\left(A^{L}X_{0}+\Delta_{L}{U_{\rm{p}}^{\phi}}\right)\xi\\ =&X_{\rm{f}}\xi.\end{aligned}

Thus, starting from $\widehat{x}_{L}=X_{\rm{f}}\xi$ with input sequence ${\Phi}_{L:2L-1}={U_{\rm{f}}^{\phi}}\xi$ , we have

\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{\Phi}_{L:2L-1}\\ \widehat{x}_{L}\end{bmatrix}\\ =&\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ X_{\rm{f}}\end{bmatrix}\xi\\ =&\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ {Y_{\rm{f}}}\end{bmatrix}\xi.\end{aligned}

$(\Leftarrow)$ Since $\begin{bmatrix}{u}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi$ , and $\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}$ is constrained by (22), we have

\displaystyle\begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ {Y_{\rm{p}}}\end{bmatrix}\xi=\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{p}}^{\phi}}\\ X_{0}\end{bmatrix}\xi.

Hence,

\displaystyle{y}_{0:L-1}={Y_{\rm{p}}}\xi={\cal O}_{L}X_{0}\xi+{\cal T}_{L}{U_{\rm{p}}^{\phi}}\xi

Since ${\Phi}_{0:L-1}={U_{\rm{p}}^{\phi}}\xi$ , we conclude that ${y}_{0:L-1}$ is the output from $\widehat{x}_{0}:=X_{0}\xi$ with input ${u}_{0:L-1}$ . As $\left({\Phi}_{0:L-1},{y}_{0:L-1}\right)$ is a valid input-output trajectory, we can use (22) and $X_{\rm{f}}$ defined in (32) to obtain the state vector at time $L$ as

\displaystyle\widehat{x}_{L}=A^{L}\widehat{x}_{0}+\Delta_{L}{\Phi}_{0:L-1}=A^{L}X_{0}\xi+\Delta_{L}{U_{\rm{p}}^{\phi}}\xi=X_{\rm{f}}\xi.

(33)

For the second segment, using the assumptions and the matrix input-output relation for ${U_{\rm{f}}^{\phi}}$ , ${Y_{\rm{f}}}$ , we have

\displaystyle\begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ {Y_{\rm{f}}}\end{bmatrix}\xi=\begin{bmatrix}I&{0}\\ {\cal T}_{L}&{\cal O}_{L}\end{bmatrix}\begin{bmatrix}{U_{\rm{f}}^{\phi}}\\ X_{\rm{f}}\end{bmatrix}\xi.

Thus, we have

\displaystyle\begin{aligned} {y}_{L:2L-1}&={Y_{\rm{f}}}\xi\\ &={\cal O}_{L}X_{\rm{f}}\xi+{\cal T}_{L}{U_{\rm{f}}^{\phi}}\xi\\ &={\cal O}_{L}\widehat{x}_{L}+{\cal T}_{L}{\Phi}_{L:2L-1},\end{aligned}

where the last equality follows from (33). Therefore, we conclude that ${y}_{L:2L-1}$ is the output from $\widehat{x}_{L}=X_{\rm{f}}\xi$ with input ${\Phi}_{L:2L-1}$ .

Putting everything together, we have

\displaystyle\begin{aligned} \begin{bmatrix}{\Phi}_{0:L-1}\\ {y}_{0:L-1}\end{bmatrix}=\begin{bmatrix}{U_{{\rm{p}}}^{\phi}}\\ {Y_{{\rm{p}}}}\end{bmatrix}\xi,\quad\begin{bmatrix}{\Phi}_{L:2L-1}\\ {y}_{L:2L-1}\end{bmatrix}=\begin{bmatrix}{U_{{\rm{f}}}^{\phi}}\\ {Y_{{\rm{f}}}}\end{bmatrix}\xi.\end{aligned}

Multiplying both sides of the first equation by $\begin{bmatrix}{U_{{\rm{p}}}^{\phi}}^{*}&0\\ 0&Y_{{\rm{p}}}^{*}\end{bmatrix}$ and the second equation by $\begin{bmatrix}{U_{{\rm{f}}}^{\phi}}^{*}&0\\ 0&Y_{{\rm{f}}}^{*}\end{bmatrix}$ proves the claim.

References

[1] M. Alsalti, J. Berberich, V. G. Lopez, F. Allgöwer, and M. A. Müller (2021) Data-based system analysis and control of flat nonlinear systems. In 2021 60th IEEE Conference on Decision and Control (CDC), pp. 1484–1489. Cited by: §I.
[2] N. Aronszajn (1950) Theory of reproducing kernels. Transactions of the American mathematical society 68 (3), pp. 337–404. Cited by: §II-C.
[3] J. Berberich and F. Allgöwer (2020) A trajectory-based framework for data-driven system analysis and control. In 2020 European Control Conference (ECC), pp. 1365–1370. Cited by: §I.
[4] A. Berlinet and C. Thomas-Agnan (2011) Reproducing kernel hilbert spaces in probability and statistics. Springer Science & Business Media. Cited by: §II-C.
[5] C. Carmeli, E. De Vito, and A. Toigo (2006-10) Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Analysis and Applications 04 (04), pp. 377–408. Cited by: §II-C, §V-A.
[6] J. Coulson, J. Lygeros, and F. Dörfler (2019) Data-enabled predictive control: in the shallows of the deepc. In 2019 18th European control conference (ECC), pp. 307–312. Cited by: §I.
[7] R. De Figueiredo and T. Dwyer (1980) A best approximation framework and implementation for simulation of large-scale nonlinear systems. IEEE Transactions on circuits and systems 27 (11), pp. 1005–1014. Cited by: §III-C2, §III-C2, §III-C2.
[8] F. Dörfler, J. Coulson, and I. Markovsky (2022) Bridging direct and indirect data-driven control formulations via regularizations and relaxations. IEEE Transactions on Automatic Control 68 (2), pp. 883–897. Cited by: §I.
[9] M. Green and J. B. Moore (1986) Persistence of excitation in linear systems. Systems & Control Letters 7 (5), pp. 351–360. Cited by: §III-A, §IV.
[10] L. Huang, J. Lygeros, and F. Dörfler (2023) Robust and kernelized data-enabled predictive control for nonlinear systems. IEEE Transactions on Control Systems Technology 32 (2), pp. 611–624. Cited by: §I.
[11] T. Liang and B. Recht (2023) Interpolating classifiers make few mistakes. Journal of Machine Learning Research 24 (20), pp. 1–27. Cited by: §IV, §IV.
[12] I. Markovsky and F. Dörfler (2022) Data-driven dynamic interpolation and approximation. Automatica 135, pp. 110008. Cited by: §I.
[13] I. Markovsky and P. Rapisarda (2008) Data-driven simulation and control. International Journal of Control 81 (12), pp. 1946–1959. Cited by: §I, §I, §V-B.
[14] I. Markovsky, J. C. Willems, S. Van Huffel, and B. De Moor (2006) Exact and approximate modeling of linear systems: a behavioral approach. SIAM. Cited by: §I.
[15] C. A. Micchelli and M. Pontil (2005) On learning vector-valued functions. Neural computation 17 (1), pp. 177–204. Cited by: §IV, §IV.
[16] O. Molodchyk and T. Faulwasser (2024) Exploring the links between the fundamental lemma and kernel regression. IEEE Control Systems Letters 8, pp. 2045–2050. Cited by: §I, §I, §I.
[17] J. Moore (1983) Persistence of excitation in extended least squares. IEEE Transactions on Automatic Control 28 (1), pp. 60–68. Cited by: §III-A, §IV.
[18] J. G. Rueda-Escobedo and J. Schiffer (2020) Data-driven internal model control of second-order discrete volterra systems. In 2020 59th IEEE Conference on Decision and Control (CDC), pp. 4572–4579. Cited by: §I.
[19] B. Schölkopf, R. Herbrich, and A. J. Smola (2001) A generalized representer theorem. In International conference on computational learning theory, pp. 416–426. Cited by: §I.
[20] X. Shang, J. Cortés, and Y. Zheng (2024) Willems’ fundamental lemma for nonlinear systems with koopman linear embedding. IEEE Control Systems Letters. Cited by: §I.
[21] P. Van Overschee and B. De Moor (1996) Subspace identification for linear systems: theory—implementation—applications. Kluwer. Cited by: §II-A, §V-B, §V-B, §V.
[22] H. J. Van Waarde, C. De Persis, M. K. Camlibel, and P. Tesi (2020) Willems’ fundamental lemma for state-space systems and its extension to multiple datasets. IEEE Control Systems Letters 4 (3), pp. 602–607. Cited by: §C-A.
[23] H. J. Van Waarde, J. Eising, M. K. Camlibel, and H. L. Trentelman (2023) The informativity approach to data-driven analysis and control. IEEE Control Systems Magazine 43 (6), pp. 32–66. Cited by: §I.
[24] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor (2005) A note on persistency of excitation. Systems & Control Letters 54 (4), pp. 325–329. Cited by: §I, §II-B.
[25] J. C. Willems (1986) From time series to linear system—part i. finite dimensional linear time invariant systems. Automatica 22 (5), pp. 561–580. Cited by: §I, §II-B, §II-B.
[26] J. C. Willems (1991) Paradigms and puzzles in the theory of dynamical systems. IEEE Transactions on Automatic Control 36 (3), pp. 259–294. Cited by: §I, §II-B.

	$\displaystyle\left\\|A\right\\|_{{\cal L}({\cal V},{\cal W})}$	$\displaystyle=\sup_{v\in{\cal V},\,\left\\|v\right\\|_{\cal V}=1}\sqrt{\langle Av,Av\rangle_{\cal W}}$
		$\displaystyle=\sup_{v\in{\cal V},\,\left\\|v\right\\|_{\cal V}=1}\sqrt{\langle v,A^{*}Av\rangle_{\cal V}}$
		$\displaystyle\leq\\|A\\|_{\cal H}.$