April 02, 2024

*Artificial neural networks* (ANN) have proven to be effective in dealing with the identification nonlinear models for highly complex systems. To still make use of the prior information available from baseline models derived from, e.g.,
*first-principles* (FP), methods have been developed that integrate the prior knowledge into the identification algorithm for the ANN in a variety of methods. These methods have shown better estimation speeds and/or accuracy on unseen data. Among
these methods are model augmentation structures. A variety of these structures have been considered in literature, there is however no unifying theory to these. In this paper, we propose a flexible *linear-fractional-representation* (LFR) based
model augmentation structure. This model structure is able to represent many common model augmentation structures, thus unifying them under the proposed model structure. Furthermore, we introduce an identification algorithm capable of estimating the
proposed model augmentation structure. The performance and generalization capabilities of the identification algorithm and the augmentation structure is demonstrated on a hardening mass-spring-damper simulation example.

With the increasing complexity and performance requirement of systems used in control, the need for accurate nonlinear models of system in practice capturing often complicated behaviors is rapidly growing. Obtaining these models through
*first-principles* (FP) based methods is becoming too costly and often infeasible due to the surge of system complexity and difficult to model phenomena for which no reliable FP descriptions exist. Data-driven techniques have appeared to address
this modelling challenge in terms of system identification. Especially, recently *artificial neural networks* (ANN) have shown to be able to capture complicated nonlinear dynamic relationships and and cope with the high system complexity. For
control applications, ANN *state-space* (SS) models [1], [2] are
particularly attractive, and with the recent introduction of encoder-based methods such as [3]–[5], estimation of them can be reliably and computationally efficiently accomplished in practice.

To still make use of the prior information available from baseline models derived from, e.g., FP, methods have been developed that integrate the prior knowledge into the identification algorithm for the ANN in a variety of methods [6]. These methods have shown better estimation speeds and/or accuracy on unseen data when a baseline model is used in the identification procedure. This can be obtained trough, e.g., modifying the cost function or model augmentation structures [7]–[9].

In the model augmentation approach, the baseline model is augmented with additional parameterized functions. Unlike other forms of baseline model inclusion methods, this avoids having to relearn the already known relationships that are present in the baseline model. This also means that some dynamics of the identified model will already be known, which could allow for more understandable identified models, in addition to the aforementioned benefits of model augmentation. This method has strong similarities to the grey-box identification method [10] that has been used in system identification with success.

There are a variety model augmentation structures presented in the literature, such as series [11]–[13], parallel (also known as residual) [12], [13] and mixed [14] augmentation structures. These can be combined
with, e.g, cost function modifications [7]. The series and parallel augmentation structures are well established methods, they are however not very flexible
in the way augmentation functions can interact with the baseline model, nor do they allow for two way augmentation between the baseline model and the augmentation functions. Mixed model structures do allow for two way interaction between the baseline model
and the augmentation functions via the state of the baseline model. These implementations are however often adhoc. An interesting implementation of the mixed augmentation structure is the *linear-fractional-representation* (LFR) [15], [16]. This model structure has been shown to be useful for robust
control [17] and linear parameter varying-control [18], [19].

We propose an LFR-based model augmentation structure that combines a baseline model with augmentation functions in a flexible manner. In the proposed model augmentation structure, an interconnection matrix governs the signals between the baseline model and augmentation functions. By shaping this interconnection matrix, we are able to represent a wide range of model augmentation structures proposed in the literature [7], [11]–[13], [15], [20] resulting in an unifying model structure represented in LFR form. Furthermore, an identification algorithm is developed capable of estimating ANN implementations of the proposed model augmentation structure. Our contributions can be summarized as follows:

LFR-based model augmentation structure unifying existing model augmentation structures

System identification algorithm capable of estimating ANN implementations of the proposed LFR-based model augmentation structure.

The remainder of the paper first introduces the system class and baseline model in Section [sec:ModelAugmentation], followed by the introduction of the
LFR-based model augmentation structure in Section [sec:Method]. Next, we show how the proposed model augmentation structure can be used to describe existing augmentation
structures in Section [sec:Structures], followed by an identification algorithm for the proposed augmentation structure in Section [sec:Model]. A hardening *mass-spring-damper* (MSD) simulation example is used to demonstrate the performance of the proposed identification method and compared to a standard ANN-SS method in
Section [sec:Result]. Conclusions are given in Sec [sec:Conclusion].

We consider the data generating *discrete-time* (DT) nonlinear system of the form \[\label{eq:nl_dyn} \begin{align} x_{k+1}&=f(x_k,u_k),\\ y_k&=h(x_k)
+e_k, \end{align}\tag{1}\] where \(x_k \in \mathbb{R}^{n_\mathrm{x}}\) is the state, \(u_k \in \mathbb{R}^{n_\mathrm{u}}\) is the input, \(y_k \in
\mathbb{R}^{n_\mathrm{y}}\) is the output signal of the system at time moment \(k\in\mathbb{Z}\) with \(e_k\) an i.i.d., possibly colored, noise process with finite variance
representing measurement noise. The state-transition function \(f: \mathbb{R}^{n_\mathrm{x}} \times \mathbb{R}^{n_\mathrm{u}} \rightarrow \mathbb{R}^{n_\mathrm{x}}\) and output function \(h:
\mathbb{R}^{n_\mathrm{x}}\rightarrow \mathbb{R}^{n_\mathrm{y}}\) are bounded functions. A data sequence generated by 1 is denoted by \(\mathcal{D}_N=\left\{\left(y_k,
u_k\right)\right\}_{k=1}^N\).

A baseline model for this DT nonlinear system can for example be obtained by FP methods. The baseline model is assumed to be a fixed model, given as \[\label{eq:FP_model} \begin{align} \tilde{x}_{k+1} &=f_\text{base}\left(\tilde{x}_k, u_k\right), \\ \tilde{y}_k &=h_\text{base}\left(\tilde{x}_k, u_k\right), \end{align}\tag{2}\] where \(\tilde{x}_k\in \mathbb{R}^{n_\mathrm{\tilde{x}}}\) is the baseline model state, \(\tilde{y}_k \in \mathbb{R}^{n_\mathrm{y}}\) the baseline model output, and \(f_\text{base}\) and \(h_\text{base}\) are the baseline model functions for state-transition and output respectively. These are jointly denoted as \(\phi_\text{base}\).

We augment the baseline model 2 with parameterized augmentation functions, jointly denoted by \(\phi_\text{aug}\). The parameters of these functions are given by the parameter vector \(\theta\). The proposed LFR-based augmentation structure that combines the \(\phi_\text{base}\) and \(\phi_\text{aug}\) is shown in Fig. 1 and can be written as \[\label{eq:interconnection} \begin{align} \left[\begin{array}{c} \hat{x}^+ \\ \hat{y} \\ \hdashline[5pt/2pt] z_1 \\ z_2 \\ \end{array}\right] & = \underbrace{\left[\begin{array}{cc;{5pt/2pt}cc} \mathit{S_{x x}} & \mathit{S_{x u}} & \mathit{S_{x w_1}} & \mathit{S_{x w_2}}\\ \mathit{S_{y x}} & \mathit{S_{y u}} & \mathit{S_{y w_1}} & \mathit{S_{y w_2}}\\ \hdashline[5pt/2pt] \mathit{S_{z_1 x}} & \mathit{S_{z_1 u}} & \mathit{S_{z_1 w_1}} & \mathit{S_{z_1 w_2}} \\ \mathit{S_{z_2 x}} & \mathit{S_{z_2 u}} & \mathit{S_{z_2 w_1}} & \mathit{S_{z_2 w_2}} \\ \end{array}\right]}_\mathbf{S} \left[\begin{array}{c} \hat{x} \\ u \\ \hdashline[5pt/2pt] w_1 \\ w_2 \\ \end{array}\right] \\ w_1 & = \phi_\text{base}\left(z_1\right) = \begin{bmatrix} f_\text{base}\left(z_1\right)\\ h_\text{base}\left(z_1\right) \end{bmatrix}\\ w_2 & = \phi_\text{aug}\left(z_2\right), \end{align}\tag{3}\] where \(\hat{x}\in \mathbb{R}^{n_{\hat{x}}}\) is the model state, \(\hat{y} \in \mathbb{R}^{n_\mathrm{y}}\) is the model output, \(\mathbf{S}\in \mathbb{R}^{n \times m}\) is the interconnection matrix, and \(\mathit{S}\) are selection matrices of appropriate size. We use the \(+\) symbol to denote time \(k+1\) and drop the subscript \(k\) for notational convenience.

The model state \(\hat{x}\) is given as \(\left[\begin{array}{cc} \tilde{x}^\top & \bar x^\top \end{array}\right]^\top\), where \(\bar x\) is the additional state that is added for dynamic augmentation structures. Not all model augmentation structures have such a state \(\bar x\). If the model structure does have an augmentation model state, it is considered a dynamic augmentation and otherwise it is a static augmentation.

By choosing different structures for \(\phi_\text{aug}\) and shaping the selection matrices \(\mathit{S}\), a large variety of augmentation structures can be represented. In Sec. [sec:Structures] we show the procedure to obtain these selection matrices \(\mathit{S}\) for a variety of model augmentation structures commonly used in the literature.

Care needs to be taken with the formulation of the LFR-based model augmentation structure to avoid algebraic loops that lead to well-posedness issues. For a given LFR-based model augmentation structure 3 , the presence of algebraic loops can be checked by the lower right submatrix of the interconnection matrix \(\mathbf{S}\) consisting of the selection matrices \(\mathit{S_{z_1 w_1}}\), \(\mathit{S_{z_1 w_2}}\), \(\mathit{S_{z_2 w_1}}\) and \(\mathit{S_{z_2 w_2}}\). Consider this submatrix as an adjacency matrix of a directed graph. If this graph is acyclic then no algebraic loop is present in model augmentation structure. The acyclic property is checked by examining the diagonal elements of the transitive closure of the adjacency matrix. If these elements are zero then the matrix is acyclic.

For training of neural networks, normalization of the input and output data to zero mean and to a standard deviation of 1 has shown to improve model estimation [21]. Therefore, the LFR-based model augmentation structure is normalized as in [3], while the augmentation functions are initialized as in [15]. This means that \(\hat{x}\), \(u\), \(\hat{y}\) in 3 are normalized. The to-be-augmented baseline model thus needs to have normalized input, state and output as well. This is accomplished by the following transformation \[\begin{align} \bar{f}_{\text{base}} & = T_x f_{\text{base}}\left(T_x ^{-1} \tilde{x}, T_u ^{-1} u \right) \\ \bar{h}_{\text{base}} & = T_y h_{\text{base}}\left(T_x ^{-1} \tilde{x}, T_u ^{-1} u \right), \end{align}\] where \(T_u \in \mathbb{R}^{n_u}\) is a diagonal matrix with the inverse of the standard deviation (taken over the samples) of denormalized input \(u\) on its diagonal. \(T_y\) is similarly defined with measurements of output \(y\). For \(T_x\), the standard deviation is determined over the baseline model states \(\tilde{x}\). These states can be obtained by, e.g., simulation of the baseline model with nominal input and initial conditions.

To demonstrate the flexibility of the proposed LFR-based model augmentation structure 3 , we show how it can be used to represent six common model augmentation structures found in the literature[6], [7], [11]–[13], [15], [16], [20]. These are the parallel, series and mixed augmentation structures in both the static and dynamic case. The dynamic cases are shown in Fig. 2. The static cases can be retrieved from the dynamic cases by removing the augmentation state \(\bar x\). We proceed by rewriting both the static and dynamic cases to an equation form for the progressed baseline state \(\tilde{x}\), progressed augmentation state \(\bar x\) and model output \(\hat{y}\). The resulting equations are shown in Table I. In this table, \(f_0\) and \(h_0\) are used instead of \(f_\text{base}\) and \(h_\text{base}\), and the remaining functions are part of the parameterized augmentation functions \(\phi_\text{aug}\). We proceed by showing how for the dynamic cases in Table I the selection matrices \(\mathit{S}\) in 3 can be obtained.

static parallel | dynamic parallel | static series | dynamic series | static mixed | dynamic mixed | |
---|---|---|---|---|---|---|

\(\tilde{x}^+\) | \(f_\text{0}\left(\tilde{x}, u\right)\) | \(f_\text{0}\left(\tilde{x}, u\right)\) | \(f_\text{0}\left(\tilde{x}, u\right)\) | \(f_\text{0}\left(\tilde{x}, u\right)\) | \(f_\text{0}\left(\tilde{x}, u\right) + g_\theta\left(\tilde{x}, u\right)\) | \(f_\text{0}\left(\tilde{x}, u\right) + g_\theta\left(\tilde{x}, \bar x, u\right)\) |

\(\bar x^+\) | - | \(f_\theta\left(\bar x, u\right)\) | - | \(f_\theta\left(h_\text{0}\left(\tilde{x}, u\right), \bar x, u\right)\) | - | \(f_\theta\left(\tilde{x}, \bar x, u\right)\) |

\(\hat{y}\) | \(h_\text{0}\left(\tilde{x}, u\right)\) + \(h_\theta\left(\tilde{x}, u\right)\) | \(h_\text{0}\left(\tilde{x}, u\right)\) + \(h_\theta\left(\bar x, u\right)\) | \(h_\theta\left(h_\text{0}\left(\tilde{x}, u\right), u\right)\) | \(h_\theta\left(\bar x, u\right)\) | \(h_\text{0}\left(\tilde{x}, u\right)\) | \(h_\text{0}\left(\tilde{x}, u\right)\) |

The first model augmentation structure we consider is the parallel output model structure shown in Fig. 2.a. The dynamic parallel model augmentation structure can be formulated into the signals in 3 as \[\begin{align} z_1 &= \mathrm{col}\left[\begin{array}{cc} \tilde{x} & u \end{array}\right],w_1 = \mathrm{col}\left[\begin{array}{cc}\tilde{x}^+ & \tilde{y}\end{array}\right] = \phi_\text{base}\left(z_1\right),\\ z_2 &= \mathrm{col}\left[\begin{array}{cc}\bar x & u \end{array}\right],w_2 = \mathrm{col}\left[\begin{array}{cc}\bar x^+ & \bar y \end{array}\right] = \phi_\text{aug}\left(z_2\right),\\ \tilde{x}^+ &= \left[\begin{array}{cc}\mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_y}\end{array}\right]w_1,\\ \bar x^+ &= \left[\begin{array}{cc}\mathrm{I}_{n_{\bar x}} & 0^{n_{\bar x} \times n_y}\end{array}\right]w_2,\\ y &= \left[\begin{array}{cc}0^{n_y \times n_{\tilde{x}}} & \mathrm{I}_{n_y}\end{array}\right]w_1 + \left[\begin{array}{cc} 0^{n_y \times n_{\bar x}} & \mathrm{I}_{n_y}\end{array}\right]w_2, \end{align}\] where \(\bar y\) is the augmentation model output, \(\mathrm{col}\) is the column vector of the concatenated values, \(\mathrm{I}_{n}\) is the identity matrix of size \(n\), and \(0^{m \times n}\) is the zero matrix of size \(m\) by \(n\). With these signals, the interconnect matrix in 3 can then be formulated with following matrices \[\begin{align} \mathit{S}_{z_1x} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{\bar x}} \\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{\bar x}} \end{bmatrix}, & \mathit{S}_{z_1u} & = \begin{bmatrix} 0^{n_{\tilde{x}} \times n_u} \\ \mathrm{I}_{n_u} \end{bmatrix},\\ \mathit{S}_{z_2x} & = \begin{bmatrix} 0^{n_{\bar x} \times n_{\tilde{x}}} & \mathrm{I}_{n_{\bar x}} \\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{\bar x}} \end{bmatrix}, & \mathit{S}_{z_2u} & = \begin{bmatrix} 0^{n_{\bar x} \times n_u} \\ \mathrm{I}_{n_u} \end{bmatrix},\\ \mathit{S}_{x w_1} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{y}} \\ 0^{n_{\bar x} \times n_{\tilde{x}}} & 0^{n_{\bar x} \times n_{y}} \end{bmatrix}, & \mathit{S}_{x w_2} & = \begin{bmatrix} 0^{n_{\tilde{x}} \times n_{\bar x}} & 0^{n_{\tilde{x}} \times n_{y}}\\ \mathrm{I}_{n_{\bar x}} & 0^{n_{\bar x} \times n_{y}} \end{bmatrix},\\ \mathit{S}_{y w_1} & = \begin{bmatrix} 0^{n_y \times n_{\tilde{x}}} & \mathrm{I}_{n_y} \end{bmatrix}, & \mathit{S}_{y w_2} & = \begin{bmatrix} 0^{n_y \times n_{\bar x}} & \mathrm{I}_{n_y} \end{bmatrix},\\ \end{align}\] with all other matrices in 3 set to zero matrices of appropriate sizes. With these selection matrices \(\mathit{S}\), the interconnection matrix \(\mathbf{S}\) can now be constructed. This, in combination with the baseline model \(\phi_\text{base}\) and the augmentation functions \(\phi_\text{aug}\), gives the LFR-based augmentation structure.

The second model augmentation structure we consider is the dynamic series model structure shown in Fig. 2.b. This model structure can be formulated into the signals in 3 as \[\begin{align} z_1 &= \mathrm{col}\left[\begin{array}{cc} \tilde{x} & u \end{array}\right],w_1 = \mathrm{col}\left[\begin{array}{cc}\tilde{x}^+ & \tilde{y}\end{array}\right] = \phi_\text{base}\left(z_1\right),\\ z_2 &= \mathrm{col}\left[\begin{array}{ccc}\bar x & u & \tilde{y} \end{array}\right],w_2 = \mathrm{col}\left[\begin{array}{cc}\bar x^+ & \bar y \end{array}\right] = \phi_\text{aug}\left(z_2\right),\\ \tilde{x}^+ &= \left[\begin{array}{cc}\mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_y}\end{array}\right]w_1, \\ \bar x^+ &= \left[ \begin{array}{cc}\mathrm{I}_{n_{\bar x}} & 0^{n_{\bar x} \times n_y} \end{array}\right]w_2, \\ y &= \left[\begin{array}{cc} 0^{n_y \times n_{\bar x}} & \mathrm{I}_{n_y} \end{array}\right]w_2. \end{align}\] The interconnect matrix can then be formulated with following matrices \[\begin{align} \mathit{S}_{z_1x} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{\bar x}} \\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{\bar x}} \end{bmatrix}, & \mathit{S}_{z_1u} & = \begin{bmatrix} 0^{n_{\tilde{x}} \times n_u} \\ \mathrm{I}_{n_u} \end{bmatrix},\\ \mathit{S}_{z_2x} & = \begin{bmatrix} 0^{n_{\bar x} \times n_{\tilde{x}}} & \mathrm{I}_{n_{\bar x}} \\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{\bar x}} \\ 0^{n_y \times n_{\tilde{x}}} & 0^{n_y \times n_{\bar x}} \end{bmatrix}, & \mathit{S}_{z_2u} & = \begin{bmatrix} 0^{n_{\bar x} \times n_u} \\ \mathrm{I}_{n_u} \\ 0^{n_{y} \times n_u} \end{bmatrix}, \\ \mathit{S}_{z_2w_1} & = \begin{bmatrix} 0^{n_{\bar x} \times n_{\tilde{x}}} & 0^{n_{\bar x} \times n_{y}}\\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{y}}\\ 0^{n_{y} \times n_{\tilde{x}}} & \mathrm{I}_{n_{y}} \end{bmatrix},\\ \mathit{S}_{x w_1} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{y}} \\ 0^{n_{\bar x} \times n_{\tilde{x}}} & 0^{n_{\bar x} \times n_{y}} \end{bmatrix}, & \mathit{S}_{x w_2} & = \begin{bmatrix} 0^{n_{\tilde{x}} \times n_{\bar x}} & 0^{n_{\tilde{x}} \times n_{y}}\\ \mathrm{I}_{n_{\bar x}} & 0^{n_{\bar x} \times n_{y}} \end{bmatrix},\\ \mathit{S}_{y w_2} & = \begin{bmatrix} 0^{n_y \times n_{\bar x}} & \mathrm{I}_{n_y} \end{bmatrix}.\\ \end{align}\] This allows for construction of 3 .

The third model augmentation structure we consider is the dynamic mixed model structure shown in Fig. 2.c. Compared to the series and parallel augmentations, this state augmentation allows for a 2-way dependency between the baseline model and the augmentation model. The dynamic mixed model structure can be formulated into the signals of in 3 as

\[\begin{align} z_1 &= \mathrm{col}\left[\begin{array}{cc} \tilde{x} & u \end{array}\right],w_1 = \mathrm{col}\left[\begin{array}{cc}\tilde{x}^+ & \tilde{y}\end{array}\right] = \phi_\text{base}\left(z_1\right),\\ z_2 &= \mathrm{col}\left[\begin{array}{ccc}\tilde{x} & \bar x & u \end{array}\right],w_2 = \mathrm{col}\left[\begin{array}{cc}\tilde{x}_g^+ & \bar x^+ \end{array}\right] = \phi_\text{aug}\left(z_2\right),\\ \tilde{x}^+ &= \left[\begin{array}{cc}\mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_y}\end{array}\right]w_1 + \left[\begin{array}{cc}0^{n_{\tilde{x}_g} \times n_{\bar x}} & \mathrm{I}_{n_{\tilde{x}_g}}\end{array}\right]w_2, \\ \bar x^+ &= \left[\begin{array}{cc}\mathrm{I}_{n_{\bar x}} & 0^{n_{\bar x} \times n_{\tilde{x}_g}}\end{array}\right]w_2, \\ y &= \left[\begin{array}{cc} 0^{n_y \times n_{\tilde{x}}} & \mathrm{I}_{n_y}\end{array}\right]w_1, \end{align}\] where \(\tilde{x}_g^+\) is the part of the augmentation model that works on the progressed baseline state \(\tilde{x}^+\). The interconnect matrix can then be formulated with following matrices \[\begin{align} \mathit{S}_{z_1x} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{\bar x}} \\ 0^{n_u \times n_{\tilde{x}}} & 0^{n_u \times n_{\bar x}} \end{bmatrix}, & \mathit{S}_{z_1u} & = \begin{bmatrix} 0^{n_{\tilde{x}} \times n_u} \\ \mathrm{I}_{n_u} \end{bmatrix},\\ \mathit{S}_{z_2x} & = \begin{bmatrix} \mathrm{I}_{n_{\hat{x}}} \\ 0^{n_u \times n_{\hat{x}}} \end{bmatrix}, & \mathit{S}_{z_2u} & = \begin{bmatrix} 0^{n_{\hat{x}} \times n_u} \\ \mathrm{I}_{n_u} \end{bmatrix},\\ \mathit{S}_{x w_1} & = \begin{bmatrix} \mathrm{I}_{n_{\tilde{x}}} & 0^{n_{\tilde{x}} \times n_{y}} \\ 0^{n_{\bar x} \times n_{\tilde{x}}} & 0^{n_{\bar x} \times n_{y}} \end{bmatrix}, & \mathit{S}_{x w_2} & = \mathrm{I}_{n_{\hat{x}}},\\ \mathit{S}_{y w_1} & = \begin{bmatrix} 0^{n_y \times n_{\tilde{x}}} & \mathrm{I}_{n_y} \end{bmatrix}.\\ \end{align}\] This allows for construction of 3 .

Consider again the DT nonlinear system 1 . Using a data sequence \(\mathcal{D}_N\) generated by 1 , we aim to identify the augmented model 3 , denoted by \(\phi_\theta\), in the form \[\label{eq:model_structure} \begin{bmatrix} \hat{x}_{k+1} \\ \hat{y}_k
\end{bmatrix} =\phi_\theta\left(\hat{x}_k, u_k\right),\tag{4}\] where the functions in \(\phi_\text{aug}\) are feedforward multi-layer artificial neural networks^{1} with \(\theta\in\mathbb{R}^{n_\theta}\) collecting the activation weights as model parameters. Under these considerations, the model structure 4
represents a recurrent neural network, also called an ANN-SS model [1].

The literature on identifying ANN-SS models is rapidly developing. There are many approaches to identify such models, e.g., [4], [22]. A recent approach that estimates 4 under statistical consistency guarantees is the SUBNET approach [3]. This method has shown state-of-the-art results on multiple identification benchmarks and can be used to estimate models with high-dimensional inputs and outputs [23]. To computationally efficient achieve this estimation, SUBNET uses a truncated simulation error objective function as shown in Fig. 3, which combines multiple subsections of length \(T\) of the available data \(\mathcal{D}_N\). This objective function is given as \[\label{eq:loss_function} \begin{align} V(\theta) = & \frac{1}{2 N(T+1)} \sum_{i=1}^N \sum_{\ell=0}^{T-1}\left\|\hat{y}_{k_i \vert k_i+\ell}-y_{k_i+\ell}\right\|_2^2 \\ \begin{bmatrix} \hat{x}_{k_i \vert k_i+\ell+1} \\ \hat{y}_{k_i \vert k_i+\ell} \end{bmatrix} &:=\phi_\theta\left(\hat{x}_{k_i \vert k_i+\ell}, u_{k_i+\ell}\right) \\ \hat{x}_{k_i \vert k_i} & :=\psi_{\theta}\left(u_{k-n_b}^{k-1},y_{k-n_a}^{k}\right), \end{align}\tag{5}\] where \(k_i \vert k_i + \ell\) indicates the state \(\hat{x}_k\) or output \(\hat{y}\) at time \(k_i + \ell\) simulated from the initial state \(\hat{x}_{k_i \vert k_i}\) at time \(k_i\). The subsections start from a randomly selected time \(k\in \{n+1,\ldots,N-T\}\). The initial state of these subsections is estimated by an encoder function \(\psi_\theta\) based on past input and output data, i.e., \(\hat{x}_{0|k}=\psi_\theta(u_{k-n_b}^{k-1},y_{k-n_a}^{k})\) where \(u_{k}^{k+\tau}= [\ u_k^\top \ \cdots \ u_{k+\tau}^\top ]^\top\) for \(\tau\geq 0\) and \(y_{k}^{k+\tau}\) is defined similarly. This encoder is co-estimated with \(\phi_\theta\).

For simulation^{2}, a 3 *degrees-of-freedom* (DOF) hardening MSD is considered as shown in Fig. 4 with the physical parameters as given
in Table 1. The states of the system representation are the positions \(p_i\) and accelerations \(\dot{p}_i\) of the masses \(m_1\), \(m_2\), \(m_3\). The hardening spring with parameter \(a_1\) is connected to mass \(m_1\). This is a nonlinear cubic spring and thus the system is nonlinear.

Mass \(m_i\) | Spring \(k_i\) | Damper \(c_i\) | Hardening \(a_i\) | |
---|---|---|---|---|

1 | \(0.5 \mathrm{~kg}\) | \(100 \mathrm{~\frac{N}{m}}\) | \(0.5 \mathrm{~\frac{N s}{m}}\) | \(100 \mathrm{~\frac{N}{m^3}}\) |

2 | \(0.4 \mathrm{~kg}\) | \(100 \mathrm{~\frac{N}{m}}\) | \(0.5 \mathrm{~\frac{N s}{m}}\) | - |

3 | \(0.1 \mathrm{~kg}\) | \(100 \mathrm{~\frac{N}{m}}\) | \(0.5 \mathrm{~\frac{N s}{m}}\) | - |

We obtain the data for model estimation by applying RK4-based numerical integration on the 3 DOF MSD system with \(T_\mathrm{s}=0.02\mathrm{s}\). The system is simulated for a *zero-order hold* (ZOH) input signal
\(F_u\) generated by a DT multisine \(u_{k}\) with frequency 1666 components in the range \([0, 25]\) Hz with a uniformly distributed phase in \([0, 2\pi)\). The sampled output measurements \(y_k\) are perturbed by additive noise \(e_k\) described by a discrete-time white noise process with \(e_k\sim\mathcal{N}(0,\sigma_\mathrm{e}^2)\). Here \(\sigma_\text{e}\) is chosen such that the *signal-to-noise ratio* (SNR) equals 60 dB. After removing the transient due to initial
conditions, the sampled generated output \(y_k\) for the input \(u_{k}\) is collected in the data sequence \(\mathcal{D}_N\). A separate data sequence is
generated for estimation, validation, and testing with size \(N_{\text{est}} = 2\cdot10^4\), \(N_{\text{val}} = 10^4\), \(N_{\text{test}} = 10^4\),
respectively.

The baseline model is taken as the discretization of the system with \(a_1=0\) and with \(k_3=0\), \(c_3=0\), \(m_3=0\).
Thus the baseline model is linear with 2 DOF. The simulation error of the baseline model on the test data is shown in orange in Fig. 5. As performance measurement of our models we use the *normalized root mean
square error* (NRMS) given as \[\mathrm{NRMS}=\frac{\sqrt{1 / N \sum_{k=1}^N\left\|y_k-\hat{y}_k\right\|_2^2}}{\sigma_{\mathrm{y}}} \times 100,\] where \(\sigma_y\) is the standard
deviation of the output measurements \(y\). The NRMS of the baseline model is shown in Table 2.

Model | NRMS |
---|---|

baseline | 35.02% |

dynamic series | 8.47% |

dynamic parallel | 17.63% |

linear dynamic mixed | 5.39% |

nonlinear static mixed | 0.38% |

nonlinear dynamic mixed |
0.15% |

ANN-SS | 1.26% |

Next, we estimate the model augmentation structures shown in Fig. 2, where the augmentation functions are chosen as feedforward neural networks with tanh activation functions. The number of hidden layers and neurons are as in Table 3. For the dynamic series and parallel case we used separate baseline model and augmentation model states, with 6 states for the augmentation model. For the mixed augmentation structure, we consider three different implementations. First, a dynamic mixed augmentation with 2 additional states and linear activation functions. With this augmented model, we expect to be able to capture the dynamics of the third mass missing from the baseline model. Second, a static mixed augmentation with nonlinear activation functions. With this augmented model, we expect to be able to capture the dynamics of the nonlinear spring missing from the baseline model. Third, a dynamic mixed augmentation with 2 additional states and nonlinear activation functions. With this augmented model, we expect to be able to capture all dynamics missing from the baseline model.

The augmented models are estimated as described in Section [sec:Model] with the aforementioned estimation data. The hyperparameters for these estimations are shown in Table 3. As a black-box comparison to the augmented models, an ANN-SS model[3] is estimated with the same estimation procedure and hyperparameters. The NRMS for simulation of the test data using these estimated models are shown in Table 2. The nonlinear dynamic mixed augmentation results in the lowest NRMS. The nonlinear static mixed augmentation also results in a low NRMS score, but slightly higher than the nonlinear dynamic mixed augmentation. This is expected for augmentation of a system with missing dynamics that can be represented with additional states. The ANN-SS model results in a higher NRMS than the nonlinear mixed augmentations. We expect, however, that the estimation of the ANN-SS model with a large number of epochs, barring local minima in the estimation process, will result in a model as accurate as the nonlinear dynamic mixed augmentation. This due to the fact that the ANN-SS is a general function approximator. It is however also desirable that the model estimation results in a good estimate in a small number of epochs, since it is not always feasible to train for a large number of epochs, due to, e.g., time or computational limitations.

hidden layers | nodes | \(n_u\) \(n_y\) | \(n_a\) \(n_b\) | \(T\) | epochs | batch size |
---|---|---|---|---|---|---|

2 | 64 | 1 | 7 | 200 | 5000 | 2000 |

Next, Fig. 5 shows the simulation error on test data in blue of the nonlinear dynamic mixed augmented model in green and the baseline model in orange. This figure shows that the augmented model is able to perform consistently well across the entire test data.

Finally, Fig. 6 shows the comparison between the states \(\hat{x}\) of the nonlinear dynamic mixed model in blue and the part of that state that is contributed by the augmentation functions in orange. The figure shows that the contribution to the state by the augmentation functions is relatively small for the first four states comprising the baseline model states \(\tilde{x}\). The last two states are the additional states \(\bar x\) added by the augmentation functions and are thus equal to the corresponding model states. From this, we can conclude that the baseline model is in terms of magnitude the dominant factor in the states that are used for the baseline model. The baseline model states are also the states used for the output function, and thus the baseline model states are dominant in the output function. Therefore, we can conclude that the augmentation functions are augmenting the baseline model resulting in an accurate model, and not replacing the baseline model with its own dynamics.

We have proposed an LFR-based model augmentation structure that flexibly augments baseline models with parameterized augmentation functions. This model augmentation structure unifies a wide range of existing model augmentation structures present in literature. Furthermore, an identification algorithm has been implemented capable of estimating ANN implementations of the proposed model augmentation structure. This identification algorithm has shown to be able to estimate the representations of the many common model augmentation structure present in literature. For the nonlinear mixed dynamic model structure the convergence to an accurate estimated model is faster than the ANN-SS estimation method, without replacing the dynamics of the baseline model.

[1]

J. A. Suykens, B. L. De Moor, and J. Vandewalle, “Nonlinear system identification using neural state space models, applicable to robust control design,” *International
Journal of Control*, vol. 62, no. 1, pp. 129–152, 1995.

[2]

M. Schoukens, “Improved initialization of state-space artificial neural networks,” in *Proc. of the 2021 European Control Conference*, 2021, pp. 1913–1918.

[3]

G. I. Beintema, M. Schoukens, and R. Tóth, “Deep subspace encoders for nonlinear system identification,” *Automatica*, vol. 156, p. 111210, 2023.

[4]

M. Forgione and D. Piga, “Continuous-time system identification with neural networks: Model structures and fitting criteria,” *European Journal of Control*, vol. 59,
pp. 69–81, 2021.

[5]

C. Verhoek, G. I. Beintema, S. Haesaert, M. Schoukens, and R. Tóth, “Deep-learning-based identification of lpv models for nonlinear systems,” in *Proc. of the
2022 IEEE 61st Conference on Decision and Control*.1em plus 0.5em minus 0.4emIEEE, 2022, pp. 3274–3280.

[6]

J. Willard, X. Jia, S. Xu, M. Steinbach, and V. Kumar, “Integrating scientific knowledge with machine learning for engineering and environmental systems,” *ACM Computing
Surveys*, vol. 55, no. 4, pp. 1–37, 2022.

[7]

A. Daw, A. Karpatne, W. D. Watkins, J. S. Read, and V. Kumar, “Physics-guided neural networks (pgnn): An application in lake temperature modeling,” in *Knowledge Guided
Machine Learning*.1em plus 0.5em minus 0.4emChapman and Hall/CRC, 2022, pp. 353–372.

[8]

D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first principles approach to process modeling,” *AIChE Journal*, vol. 38, no. 10, pp. 1499–1511, 1992.

[9]

J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R. Hunt, M. Girvan, and E. Ott, “Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a
knowledge-based model,” *Chaos: An Interdisciplinary Journal of Nonlinear Science*, vol. 28, no. 4, 2018.

[10]

T. P. Bohlin, *Practical grey-box process identification: theory and applications*.1em plus 0.5em minus 0.4emSpringer Science & Business Media, 2006.

[11]

R.-S. Götte and J. Timmermann, “Composed physics-and data-driven system identification for non-autonomous systems in control engineering,” in *Proc. of the
3rd International Conference on Artificial Intelligence, Robotics and Control*, 2022, pp. 67–76.

[12]

B. Sohlberg and E. W. Jacobsen, “Grey box modelling–branches and experiences,” *IFAC Proceedings Volumes*, vol. 41, no. 2, pp. 11 415–11 420, 2008.

[13]

M. L. Thompson and M. A. Kramer, “Modeling chemical processes using prior knowledge and neural networks,” *AIChE Journal*, vol. 40, no. 8, pp. 1328–1340, 1994.

[14]

B. Sun, C. Yang, Y. Wang, W. Gui, I. Craig, and L. Olivier, “A comprehensive hybrid first principles/machine learning modeling framework for complex industrial processes,”
*Journal of Process Control*, vol. 86, pp. 30–43, 2020.

[15]

M. Schoukens and R. Tóth, “On the initialization of nonlinear LFR model identification with the best linear approximation,”
*IFAC-PapersOnLine*, vol. 53, no. 2, pp. 310–315, 2020.

[16]

L. Hewing, D. Gramlich, C. Verhoek, J. Veenman, R. Polonio, C. Ardura, R. Tóth, C. Ebenbauer, C. W. Scherer, and V. Preda, “Enhancing the guidance, navigation and
control of autonomous parafoils using machine learning methods,” in *Proc. of the 12th International Conference on Guidance, Navigation & Control Systems*, 2023, pp. 1–15.

[17]

K. Zhou, J. Doyle, and K. Glover, *Robust and Optimal Control*, ser. Feher/Prentice Hall Digital.1em plus 0.5em minus 0.4emPrentice Hall, 1996.

[18]

R. Tóth, *Modeling and identification of linear parameter-varying systems*.1em plus 0.5em minus 0.4emSpringer, 2010, vol. 403.

[19]

M. Schoukens and R. Tóth, “Linear parameter varying representation of a class of MIMO nonlinear systems,” *IFAC-PapersOnLine*, vol. 51,
no. 26, pp. 94–99, 2018.

[20]

M. Bolderman, H. Butler, S. Koekebakker, E. van Horssen, R. Kamidi, T. Spaan-Burke, N. Strijbosch, and M. Lazar, “Physics-guided neural networks for feedforward control with
input-to-state-stability guarantees,” *Control Engineering Practice*, vol. 145, p. 105851, 2024.

[21]

C. Bishop, “Neural networks for pattern recognition,” *Clarendon Press google scholar*, vol. 2, pp. 223–228, 1995.

[22]

D. Masti and A. Bemporad, “Learning nonlinear state–space models using autoencoders,” *Automatica*, vol. 129, p. 109666, 2021.

[23]

G. I. Beintema, R. Tóth, and M. Schoukens, “Non-linear state-space model identification from video data using deep encoders,” *IFAC-PapersOnLine*,
vol. 54, no. 7, pp. 697–701, 2021.

Consider that each hidden layer is composed of \(m\) activation functions \(\rho:\mathbb{R} \rightarrow \mathbb{R}\) in the form of \({z}_{i,j} = \rho(\sum_{l=1}^{m_{i-1}}\theta_{\mathrm{w},i,j,l} {z}_{i-1,l}+ \theta_{\mathrm{b},i,j})\) where \({z}_i=\mathrm{col}(z_{i,1},\ldots,z_{i,{m_i}})\) is the latent variable representing the output of layer \(1\leq i\leq q\). Here, \(\mathrm{col}(\centerdot)\) denotes the composition of a column vector. For a \(f_{\theta}\) with \(q\) hidden-layers and linear input and output layers, this means \(f_{\theta}(\hat{{x}}_{k}, {u}_{k})= \theta_{\mathrm{w},q+1} {z}_q(k) + \theta_{\mathrm{b},q+1}\) and \({z}_{0}(k)=\mathrm{col}(\hat{x}_{k}, {u}_{k})\).↩︎

`https://github.com/Mixxxxx358/Model-Augmentation-Public`

↩︎