Water wave reconstruction of full hydrodynamic models via data assimilation


Abstract

A strategy for reconstructing the water wave field using a data assimilation method is proposed in the present study. Special treatments are introduced to address the ensemble diversity and the discontinuous free surface with hydrodynamic constraints when implementing the EnKF approach. Additionally, the POD method is employed for dimensionality reduction, but from an ensemble point of view. The main purpose of this study is to achieve satisfactory consistency between the water waves computed by the numerical solver, particularly by the VOF method, and those observed in the laboratory wave flume within the test section of interest. To validate the proposed framework, three representative conditions are tested: regular waves, irregular waves, and plunging waves. The effects of observation noise, modal truncation, and other factors are also examined. From a practical perspective, this work provides a promising way to realize the coupling between experiments and numerical simulations, and establishes a prototype of a “digital twin wave tank”.

1 Introduction↩︎

Wave flumes serve as essential platforms for the study of wave dynamics. By generating specific wave conditions in a controlled environment, researchers are able to conduct repeatable experiments and gain insights into complex phenomena such as wave propagation, breaking, and reflection [1][3]. However, despite the high degree of control in laboratory settings, the wave generation process is inevitably subject to various sources of error. For instance, key parameters such as wave height and period may deviate from their design values during propagation due to limitations in wavemaker precision [4], [5], boundary condition configuration, and wave tank reflections [6]. These deviations are often difficult to observe directly or quantify accurately, resulting in discrepancies between experimental data and numerically simulated expectations. Such inconsistencies further limit the usefulness of experimental results for validating numerical models. Consequently, accurately reconstructing the actual wave conditions that occurred during experiments becomes a crucial challenge in improving the coupling between experiments and simulations.

Wave reconstruction is essentially an inverse problem with physical constraints. The core of this problem lies in inferring the entire flow state and evolution process, given partial observation data. Currently, methods used to solve the inverse problem mainly include data assimilation (DA), machine learning, and other approaches. Among machine learning methods, some studies attempt to extract high-resolution wave states or propagation processes from observations. For example, [7] developed a neural network that reconstructs high-resolution velocity fields from low-resolution computational fluid dynamics (CFD) data for wave-structure interaction; [8] proposed a method for estimating wave elevations using stereo cameras, training a deep learning model with datasets and testing its accuracy and robustness. These studies have achieved varying degrees of wave field reconstruction but are unable to obtain the entire wave state. Furthermore, these methods often rely on large amounts of training samples, leading to weak generalization ability and poor physical consistency, making them challenging for complex tasks. In particular, the physics-informed neural network (PINN) [9] has been developed in recent years. Unlike conventional neural networks that only fit the training data, PINN incorporates physical constraints to ensure that the predictions comply with governing equations, thereby greatly enhancing physical consistency. However, to the best of our knowledge, the application of PINN to wave problems has not yet been reported.

Data assimilation is a method that does not require pre-training and has strong physical consistency. It has been widely applied in fields such as meteorology and oceanography [10]. Its application in wave problems is still primarily combined with simplified wave models, such as potential theory, which ignore viscous effects. Most existing research on wave reconstruction is based on methods such as shallow water equations [11], [12] or high-order spectral (HOS) method [13]. For example, [11] combined a shallow water finite element model (FEM) with variational methods to invert the initial state of the model; [13] developed a coupled approach of the HOS method with the ensemble Kalman filter (EnKF), through which the measurement data can be incorporated into the simulation to improve the forecast performance. However, these methods have significant limitations when it comes to accurately capturing complex dynamic behaviors. For example, local phenomena such as wave plunging and wave crest formation, which are widely studied in experimental wave tanks, often fall outside the assumptions of shallow water equations [14], [15] or the HOS method [16].

In the case of strong nonlinearity, waves exhibit significant features such as viscous dissipation and sharp variations in the free surface, which must be governed using Navier-Stokes equations. Previous research based on the incompressible Navier-Stokes equations use DA methods such as EnKF [17] and 4D-Var [18] to reconstruct flow fields, applied to classical problems such as flow around a cylinder [19], [20], flow around an airfoil [21], and pipe flow problems [22]. However, few previous studies have addressed the problem of free boundary conditions. Some studies use FEM to solve wave reconstruction problems. For example, [23] employed the free surface barotropic Navier-Stokes equations as the flow model, using velocity and elevation as observation data to determine boundary conditions. [24] incorporated an adaptive mesh method into the adjoint model to reconstruct the free surface boundary of two-dimensional waves using elevation as the observation. It should be noted that FEM faces challenges in enforcing local mass conservation and capturing strong discontinuities at the interface, and it requires complex mesh deformation and remeshing when the interface undergoes large deformations. However, to date, there has been little research on wave reconstruction based on the finite volume method (FVM). In contrast, the FVM inherently guarantees conservation and is more convenient to couple with interface capturing methods such as volume of fluid (VOF) and level set. Therefore, FVM is generally more robust when dealing with large interface deformations and strict mass preservation requirements. In this work, FVM is adopted as the numerical method, combined with interface capturing methods to track the free surface. Besides, these existing studies only focus on reconstructing the free surface, without achieving full-field state reconstruction. This gap severely limits the verification and coupling between experimental and numerical models.

To achieve full-field wave state reconstruction, this study establishes a wave reconstruction framework that integrates the EnKF with FVM-based full hydrodynamic model. By assimilating local interface observations, the framework reconstructs the true flow field, including key physical quantities such as velocity and free-surface. Using the reconstructed flow as the initial condition, the simulation is further advanced to recover the propagation process. This reconstruction framework can be regarded as a prototype of a “digital twin wave tank", which not only corrects discrepancies between numerical models and experiments but also supports wave dynamics analysis, thereby enhancing both the interpretability and extensibility of experimental research.

However, this work faces several challenges. First, employing FVM to describe wave motion requires spatial and temporal discretization. Directly assimilating the high-dimensional flow field would increase the dimension of the state vector, making it necessary to introduce dimensionality-reduction techniques. In particular, the free surface is a strongly discontinuous, dynamically evolving interface, and determining an effective order reduction method of the free surface remains one of the key challenges.

To reduce the dimensionality of the state vector and improve the efficiency of the assimilation procedure, the present study employs Proper Orthogonal Decomposition (POD) to perform state reduction of the flow field. Such reduced-order methods have been widely applied in data assimilation problems for single-phase flows [25], [26], as they effectively extract dominant modes while preserving the essential features of the flow field. However, the conventional construction of the snapshot matrix is typically based on time-series data, which is not suitable for wave problems characterized by strong discontinuities and non-periodic variations. To address this issue, we adopt a different dataset construction strategy. This strategy enables a more accurate capture of diverse wave-field characteristics, as will be detailed in the next section.

In addition, the choice of method for representing the free surface, so as to ensure the highest reconstruction accuracy, remains a question worth investigation. In CFD, common approaches for handling free-surface problems include the VOF method [27], the level set method [28], and hybrid techniques combining both [29]. In this study, the free surface evolution in the solver is advanced using the VOF method. For the assimilation of the free surface, we employed three different representations: VOF, level set, and curve-based descriptions, where both the level set and curve representations are derived from the VOF data. We conduct a comparative analysis to evaluate their performance in wave field reconstruction.

In Sec 2, we provide a detailed description of the proposed framework, including its workflow, methods, and governing equations. Sec 3 presents reconstruction results for three representative cases: regular wave, irregular wave, and plunging wave. Finally, Sec 4 summarizes the main conclusions.

2 METHODOLOGY↩︎

Figure 1: Sketch of the numerical wave flume where the test section is the focus of this study.

We consider an experimental water tank analogous to the numerical tank shown in Fig. 1, where the test section accommodates the marine structure (if applicable). To build a reliable digital twin system, the numerical tank should generate waves that closely match those observed in the laboratory, particularly within the test section. This alignment is the major objective of this study. However, due to inevitable experimental uncertainties at boundaries or the initial state, the wave propagation in the numerical tank remains challenging to determine. To address this, we employ a data assimilation approach to simultaneously correct the free surface and flow field by solely monitoring the part of the free surface before it travels into the test section, ensuring wave characteristics align with experimental state within the test section. Notably, in the absence of physical experiments, the true wave conditions are simulated using the same numerical wave tank, from which interface observations are extracted.

The numerical solver employed in this study is interFoam, a multiphase flow solver provided by the OpenFOAM framework[30]. The solver is based on the finite volume method and is designed to simulate two incompressible, isothermal, and immiscible fluids, using a VOF method to capture the free surface. The governing equations are \[\begin{align} \nabla \cdot \boldsymbol{u}& = 0, \\\ \frac{\partial (\rho \boldsymbol{u})}{\partial t} + \nabla \cdot (\rho \boldsymbol{u} \otimes \boldsymbol{u}) &= -\nabla p + \nabla \cdot \left[ \mu \left( \nabla \boldsymbol{u} + (\nabla \boldsymbol{u})^T \right) \right] + \rho \boldsymbol{g} + \boldsymbol{F}_\sigma,\\\ \frac{\partial \alpha}{\partial t} + \nabla \cdot (\alpha \boldsymbol{u}) &= 0. \end{align}\] Here, \(\boldsymbol{u}\) is the velocity vector, \(\rho\) is the fluid density, \(p\) is the pressure, \(\mu\) is the dynamic viscosity, \(\boldsymbol{g}\) is the gravitational acceleration, and \(\boldsymbol{F}_\sigma\) is the surface tension force. The two phases considered in this study are air and water, with \(\alpha\) denoting the volume fraction of water. The density and viscosity are computed as \[\begin{align} \rho = \alpha \rho_w + (1-\alpha) \rho_a,\\ \quad \mu = \alpha \mu_w + (1-\alpha) \mu_a. \end{align}\] \(\rho_w\) and \(\rho_a\) denote the densities of water and air, respectively, while \(\mu_w\) and \(\mu_a\) represent their corresponding dynamic viscosities. The surface tension force \(\boldsymbol{F}_\sigma\) is modeled using the continuum surface force (CSF) formulation, expressed as \[\boldsymbol{F}_\sigma = \sigma \kappa \nabla \alpha,\] where \(\sigma\) is the surface tension coefficient and \(\kappa\) is the curvature, given by \[\kappa = \nabla \cdot \left( \frac{\nabla \alpha}{|\nabla \alpha|} \right).\] The gradient \(\nabla \alpha\) is nonzero only in the vicinity of the interface, ensuring that the surface tension is applied solely at the phase boundary.

In the simulations, we examine three typical wave conditions: regular waves, irregular waves, and wave plunging. We adopt fifth-order Stokes waves, JONSWAP-spectrum waves, and cosine waves as the concrete waves corresponding to the three wave conditions, respectively. The generation of JONSWAP waves is facilitated using the waves2Foam toolbox. There are some differences depending on the wave characteristics. The regular and irregular waves are modeled using laminar approach. While the wave plunging case involves strong turbulence effects, which are modeled using the Reynolds-Averaged Navier–Stokes (RANS) approach. The SST \(k\)\(\omega\) turbulence model is adopted. In wave plunging cases, the wave run-up process is influenced by the bottom boundary, and particular attention is paid to the effect of the boundary layer developing in this region. Hence, We restrict \(y^+\approx 1\) along the bottom wall, \(y^+\) as the dimensionless distance from the wall to the first grid node in the boundary layer may be expressed as \(y^+ = {\Delta y u_{\tau}}/{\nu}\), where \(\Delta y\) is the height of the first grid, \(\nu\) denotes the local kinematic viscosity of the fluid, and \(u_{\tau}\) represents the friction velocity at the nearest wall.

After establishing the numerical wave tank, we adopt the EnKF as the method to solve the reconstruction problem. The EnKF is a class of ensemble-based data assimilation methods, which requires the construction of an ensemble of samples to statistically represent the uncertainty in the initial state. Unlike conventional approaches[31] in flow field reconstruction that generate ensemble members by perturbing either the baseline flow field directly or the low-dimensional modal coefficients derived from a reduced-order representation, we use a physics-constrained ensemble generation strategy. Specifically, we introduce perturbations to key wave parameters (e.g., wave height or period) and use these perturbed wave conditions as inputs to the inlet boundary condition to generate different waves. The ensemble of wave parameters is generated as \[\beta^{(n)} = \beta^b+\eta, \label{eq:wave32parameters}\tag{1}\] where \(\beta\) represents one of the wave parameters, such as wave height or wave period. \(\beta^{(n)}\) is the special wave parameter of the \(n\)-th sample. \(\beta^b\) is the corresponding parameter of the baseline wave, and \(\eta~ \sim \mathcal{N}(0, \sigma^2)\) represents a normally distributed perturbation with a user-defined standard deviation \(\sigma\). For example, the wave height of the \(n\)-th sample in the ensemble is computed as \(H^{(n)} = H^b + \eta\). Wave periods can be generated in a similar fashion, represented as \(T^{(n)}\). Then the parameters \(H^{(n)}\) and \(T^{(n)}\) are used to generate the wave in the \(n\)-th ensemble member.

The solver forwards to the first assimilation instant \(t_0\), yielding physics-constrained flow fields as ensemble members as shown in Fig 2. The first assimilation instant, \(t_0\), is defined as the moment when the wave has propagated through the entire tank, corresponding to the requirement in wave flume experiments that the wave train be stably generated. This approach ensures that the resulting ensemble members respect the underlying physical constraints of wave dynamics, and facilitates the construction of a representative dataset from which modes can be extracted. The diversity of wave conditions embedded in the ensemble allows the dataset to capture more flow features, which is particularly important for non-periodic and strongly discontinuous flows.

Figure 2: Flow chart of the model advancing and DA strategy. The ensemble is used throughout the whole simulation process to keep the diversity of members, and DA is carried out only when the reconstruction is needed. The reconstructed wave field will not be used to update the original simulation.

When the wave ensemble propagates to \(t_0\), the assimilation process is initiated at this instant. At this moment, the target wave train has not yet entered the test section. Therefore, performing assimilation at this moment ensures that the target wave train is already corrected before entering the test section, making it consistent with the true state in the test section. The procedure for single assimilation is illustrated in Fig 3. Notably, the challenge of high dimensionality persists, leading to computational efficiency issues. The dimensionality of discretized flow fields is determined by the mesh resolution, typically leading to high-dimensional state variables. A widely used solution is applying the POD method to reduce the order of the field while preserving the inherent key flow structures. Therefore, at this instant, the POD method can be carried out, as shown in Fig 3. The instantaneous fields of all ensemble members are collected into separate datasets, each corresponding to a different variable. Singular Value Decomposition (SVD) is then performed to extract dominant modes for each dataset: the level set field \(\boldsymbol{\phi}\) or volume fraction field \(\boldsymbol{\alpha}\) to resolve the free surface (collectively denoted as \(\boldsymbol{\chi}\)), as well as the velocity components \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\), yielding mode matrices \(\boldsymbol{U}_{\gamma}\), where \(\boldsymbol{\gamma} \in \{\boldsymbol{\chi}, \boldsymbol{u}_x, \boldsymbol{u}_z\}\) and \(\boldsymbol{\chi} \in \{\boldsymbol{\phi}, \boldsymbol{\alpha}\}\). In particular, we also performed assimilation using a curve representation of the free surface. Since the curve requires only a limited number of interpolation nodes for its description, no dimensionality reduction is involved. Further details are provided later.

Figure 3: Flow chart for the assimilation procedure to reconstruct the wave field at a time.

Noteably, the dataset to be decomposed in this study differs from the commonly used snapshot matrix. In the conventional approach, the snapshot matrix is assembled from multiple temporal snapshots of the flow field obtained from a single simulation case. In contrast, the matrix in the present work is constructed from different wave realizations at the instant time, as detailed in Sec 2.1. There are some reasons. First, the conventional approach is well-suited for highly periodic flows but is less applicable to non-periodic flows such as irregular waves. Since the conventional method collects temporal flow-field data, it primarily captures the dominant features that emerge as the flow evolves over time. Second, the conventional method struggles when flow characteristics vary significantly[32]. Modal structures extracted from a single specific simulation are applicable only when subsequent flows share a sufficient degree of similarity. If the flow features differ markedly from those in the specific case, the extracted modes—lacking information about the new features—are unlikely to remain valid. The adopted approach effectively avoids this limitation by incorporating variability directly into the decomposed matrix. Moreover, the conventional method is not well-suited for free-surface problems. There is a strongly discontinuous field. Even when wave profiles differ only slightly, modes extracted from a single profile are often not available to other profiles and fail to capture their geometric characteristics. Finally, the conventional POD method requires flow field data over a certain period as prior information to extract the modes. In contrast, the adopted method does not require flow data in advance. Instead, the sample flow fields are generated and used at the current assimilation time, which aligns more closely with the concept of sequential data assimilation. A detailed comparison of mode performance between the conventional approach and the present method is provided in Sec 3.1.2.

After the POD, we obtain the modal space. The ensemble forecast fields are then projected onto the truncated modal space to form the forecast state \(\boldsymbol{X}^f = [\boldsymbol{x}^{(0),f}, \boldsymbol{x}^{(1),f}, ..., \boldsymbol{x}^{(N),f}]\), \(N\) is the ensemble size. Each state vector \(\boldsymbol{x}^{(n),f}\) is computed as \[\begin{gather} \boldsymbol{x}^{(n),f}= \begin{bmatrix} \boldsymbol{a}^{(n),f}_{\chi}, \\ \boldsymbol{a}^{(n),f}_{u_x},\\ \boldsymbol{a}^{(n),f}_{u_z} \\ \end{bmatrix} , \tag{2}\\ \boldsymbol{a}^{(n),f}_{\gamma} = \hat{\boldsymbol{U}}_\gamma^\mathrm{T} \boldsymbol{\gamma}^{(n)}, \tag{3} \end{gather}\] where \(\hat{\boldsymbol{U}}_{\gamma}\in \mathbb{R}^{m \times r_{\gamma}}\) denotes the truncated mode matrix containing the first \(r_{\gamma}\) leading modes, while \(m\) is the dimensionality of the physical field.

The forecast state vector \(\boldsymbol{x}^f\) is then put into the analysis step, where it is updated using the EnKF based on available observations. After analysis step, we get the analysis state matrix \(\boldsymbol{X}^a = [\boldsymbol{x}^{(0),a}, \boldsymbol{x}^{(1),a}, ..., \boldsymbol{x}^{(N),a}]\). The ensemble-averaged analysis state vector \(\overline{\boldsymbol{x}^a}\) and its components \(\overline{\boldsymbol{a}^a_{\gamma}}\) could be obtained, which are used to reconstruct the physical fields through \[\boldsymbol{\gamma}^{a} = \hat{\boldsymbol{U}}_\gamma \, \overline{\boldsymbol{a}^a_\gamma}.\]

Please note that, \(\boldsymbol{\chi}^a\), \(\boldsymbol{u}_x^a\), and \(\boldsymbol{u}_z^a\) share the same dimensionality and correspond directly to the mesh of the numerical wave tank. Therefore, no additional processing is used to ensure consistency between the free surface field and the velocity field. After obtaining the reconstructed data \(\boldsymbol{\gamma}^a \in \mathbb{R}^{m \times 1}\) , it is straightforward to have the initial condition to propagate the corrected wave into the test section, as shown in Fig 2. Since the boundary conditions are not reconstructed, the wave generation remains the same as the baseline. When the reconstructed field forecasts until the next assimilation instant \(t_1\), the forecast fields during \(t_0\) to \(t_1\) can serve as a reference for the true wave evolution. We establish a criterion to determine the interval between \(t_0\) and \(t_1\). As the reconstructed flow field advances, the forecast state gradually deviates from the true state. In particular, within the test section, we aim to ensure that the flow field remains consistent with the true state. So we continuously compute the root mean square (RMS) error between the predicted free surface and the observations, and once this error exceeds a prescribed threshold, the next assimilation is performed, as expressed by the following equation \[\varepsilon = \sqrt{\frac{1}{j} \sum_{k=1}^{j} \left( \eta_k^{f} - \bar{y}_k \right)^2 }, \label{eq:error}\tag{4}\] where \(\eta_k^{f}\) is the free surface elevation at the \(k\)-th point in the forecast field, \(\bar{y}_k\) is the corresponding observation value, and \(j\) is the total number of observations. Besides, \(\bar{\boldsymbol{y}}\) is the mean vector of an perturbed observations ensemble, and \(\bar{y}_k\) is the \(k\)-th element of the vector. The wave ensemble is left unaltered and continues to be propagated forward without correction as shown in Fig 2. This strategy ensures the diversity of the wave samples, which is essential for extracting well-representative modes and avoiding sample collapse. Then, the single DA procedure is carried out to obtain the reconstructed field at \(t_1\). This reconstructed field is generally more accurate than the forecast at \(t_1\), as it is directly corrected through data assimilation. This strategy ensures, to the greatest extent possible, that the waves within the test section remain consistent with the true state during the entire propagation time.

2.1 Modal Decomposition via POD↩︎

The primary motivation for employing modal decomposition is to reduce the dimensionality of the flow field by projecting the high-dimensional data onto a modal space, thereby obtaining a compact set of modal coefficients that effectively represent the flow pattern. This dimensionality reduction significantly decreases the computational cost of the EnKF by operating in a reduced-order state space. Moreover, it mitigates the issues associated with directly assimilating high-dimensional state vectors, such as the introduction of non-smoothness, violation of physical constraints, and potential numerical instabilities or divergence. By capturing the dominant coherent structures of the flow, the modal representation ensures that the assimilation process remains both computational efficient and physics-constrained.

In this study, we perform modal decomposition through POD. In the beginning of Sec 2, we have discussed that, for wave problems, the conventional POD snapshot matrix is not applicable, and a new approach is required to construct the matrix. The matrix is composed of wave sample data generated under different parameter settings while satisfying physical constraints.

After the ensemble cases forecast to \(t_i\), POD is performed independently for each physical quantity. Taking a general variable \(\boldsymbol{\gamma} \in \{\boldsymbol{\chi}, \boldsymbol{u}_x, \boldsymbol{u}_z\}\), the matrix is constructed as \[\boldsymbol{A}_{\gamma} = [\boldsymbol{\gamma}^{(0),f}, \boldsymbol{\gamma}^{(1),f}, \dots, \boldsymbol{\gamma}^{(N),f}],\] where \(\boldsymbol{\gamma}^{(n),f}\) denotes the field of the \(n\)-th ensemble member at time \(t_i\) before analysis , \(t_i\) is omitted here.

SVD is then applied to the matrix as \[\boldsymbol{A}_{\gamma} = \boldsymbol{U}_{\gamma} \boldsymbol{\Sigma}_{\gamma} \boldsymbol{V}_{\gamma}^T,\] where \(\boldsymbol{U}_{\gamma}\) contains the spatial modes as its columns, \(\boldsymbol{\Sigma}_{\gamma}\) is a diagonal matrix of singular values representing the relative energy of each mode, and \(\boldsymbol{V}_{\gamma}\) contains the modal coefficients for each sample. In subsequent assimilation and reconstruction steps, a truncated modal matrix \(\hat{\boldsymbol{U}}_{\gamma}\) is used.

2.2 Wave Interface Representation↩︎

To effectively assimilate and reconstruct wave fields, a consistent and flexible representation of the free surface is essential. In this study, we consider three interface representation approaches: VOF, level set, and Curve-based tracking. The motivation for considering these three methods is to compare their suitability for wave reconstruction, examining their performance, advantages, and limitations in the reconstruction process.

The VOF method represents the interface implicitly via a scalar field \(\boldsymbol{\alpha}(\boldsymbol{l}) \in [0, 1]\), which denotes the fluid volume fraction in each computational cell, \(\boldsymbol{l}\) represents a location in the field. A value of \(\alpha=1\) indicates the cell is fully filled with water, \(\alpha=0\) denotes air, and \(\alpha=0.5\) approximates the location of the interface. VOF is mass-conserving and widely used in two-phase flow simulations.

The level set method represents the interface as the zero contour of a signed distance function field \(\boldsymbol{\phi}(\boldsymbol{l})\), where \(\phi>0\) in air, \(\phi<0\) in water, and \(\phi=0\) at the interface. It offers a smooth and differentiable representation, which is more suitable for POD to extract effective modes.

We convert VOF data into level set field using a signed distance function. For a given volume fraction \(\boldsymbol{\alpha}\), the corresponding level set function \(\boldsymbol{\phi}\) is obtained via

\[\boldsymbol{\phi} = \left\{ \begin{align} &\underset{k \in \Gamma}{\min} \, ||\boldsymbol{l}-\boldsymbol{k}||, && \alpha(\boldsymbol{l}) < 0.5, \\ &-\underset{k \in \Gamma}{\min} \, ||\boldsymbol{l}-\boldsymbol{k}||, && \alpha(\boldsymbol{l}) > 0.5, \\ &0, && \alpha(\boldsymbol{l}) = 0.5. \end{align} \right.\] \(\Gamma\) is the interface, and \(\boldsymbol{k}\) is the point at the interface.

In addition, we also extract the wave interface as a parametric cubic spline \(f(x)\), where \(x\) is the coordinate in the x-direction. The spline curve is uniquely determined by a set of interpolation nodes. Once the number of nodes \(r_c\) is specified, their abscissae \(\{x_i^{c}\}_{i=1}^{r_c}\) are uniformly distributed along the wave-propagation direction, and their ordinates correspond to the wave elevations at those positions. The vector \(\boldsymbol{h}^{(n),f} \in \mathbb{R}^{r_c}\) collects these ordinates and constitutes part of the state vector \[\boldsymbol{x}^{(n),f} = \begin{bmatrix} \boldsymbol{h}^{(n),f} \\ \boldsymbol{a}_{u_x}^{(n),f} \\ \boldsymbol{a}_{u_z}^{(n),f} \end{bmatrix}. \label{eq:curve32state}\tag{5}\]

In practice, the level set method is used for modal decomposition and reconstruction, due to its more continuous structure. The VOF field, although more accurate in terms of mass conservation, may introduce higher noise due to its strongly discontinuous nature near the interface, and consequently requires more modes for effective representation. The curve-tracking approach is not well-suited for highly complex conditions. In cases involving plunging or breaking waves, it would require a very dense distribution of spline interpolation nodes and may introduce non-physical corrections during data assimilation. Therefore, the curve-based method is employed only for relatively simple regular and irregular wave cases. The relative discussion will be provided in Sec 3.

2.3 EnKF↩︎

In this study, we adopt the EnKF to assimilate the forecast state vectors. EnKF is a Monte Carlo–based Bayesian filtering method, well-suited for high-dimensional and nonlinear systems. Its core idea is to approximate the covariance structure of the system state using a finite ensemble of samples, and to update these samples based on observations to obtain the analysis state.

As described in the beginning of this section, the forecast state ensemble is denoted as \[\boldsymbol{X}^f = [\boldsymbol{x}^{(0),f}, \boldsymbol{x}^{(1),f}, ..., \boldsymbol{x}^{(N),f}],\] where \(\boldsymbol{x}^{(n),f}\) represents the forecast state vector of the \(n\)-th ensemble member, detailed in Eq 3 . The ensemble mean is given by \[\overline{\boldsymbol{x}^f} = \frac{1}{N} \sum_{n=1}^{N} \boldsymbol{x}^{(n),f}.\]

The observation in this study consists of the free surface elevations measured at \(j\) positions, collectively denoted as \(\boldsymbol{y}\in\mathbb{R}^{j}\). It should be emphasized that the \(j\) observation points are all located in the region upstream of the test section, meaning that the target section of the wave is observed before it enters the test section. The wave reconstruction is performed at this moment to ensure that, once the waves enter the test section, the flow in the numerical simulation is consistent with the true flow. Let \(\{x_k^{\text{obs}}\}_{k=1}^{j}\) be the gauge locations. To simulate realistic measurement errors and satisfy EnKF assumptions, Gaussian noise is added to \(\boldsymbol{y}\) to generate an ensemble of perturbed observations \[\boldsymbol{Y} = [\boldsymbol{y}^{(1)}, \boldsymbol{y}^{(2)}, ..., \boldsymbol{y}^{(N)}],\] where \(\boldsymbol{y}^{(n)} = \boldsymbol{y} + \boldsymbol{\epsilon}\), and \(\boldsymbol{\epsilon} \sim \mathcal{N}(0, \boldsymbol{R})\), with \(\boldsymbol{R}\) being the observation error covariance matrix. In this paper, we simplify \(\boldsymbol{R}=\sigma^2 \boldsymbol{I}\), where \(\sigma^2\) is the constant variance of the observation noise and \(\boldsymbol{I}\) is the identity matrix of appropriate dimensions.

During the EnKF analysis step, each forecast sample is updated by incorporating the corresponding perturbed observation. The update formula for the \(n\)-th ensemble member is \[\boldsymbol{x}^{(n),a} = \boldsymbol{x}^{(n),f} + \boldsymbol{K} \left( \boldsymbol{y}^{(n)} - \mathcal{H}(\boldsymbol{x}^{(n),f}) \right),\] where \(\boldsymbol{x}^{(n),a}\) is the analysis state, \(K\) is the Kalman gain matrix, and \(\mathcal{H}(\cdot)\) denotes the observation operator that maps the modal coefficients to the observation space. The detailed derivation of the observation operator can be found in the Appendix.

The Kalman gain \(\boldsymbol{K}\) is given by \[\boldsymbol{K} = \boldsymbol{P}_f \boldsymbol{H}^T \left( \boldsymbol{H} \boldsymbol{P}_f \boldsymbol{H}^T + \boldsymbol{R} \right)^{-1},\] where \(\boldsymbol{P}_f\) is the sample covariance of the forecast ensemble, and \(\boldsymbol{H}\) is the linearized observation operator (when applicable), estimated empirically from the ensemble.

The ensemble mean of the analysis states serves as the optimal estimate used for flow field reconstruction \[\overline{\boldsymbol{x}^a} = \frac{1}{N} \sum_{n=1}^{N} \boldsymbol{x}^{(n),a}.\]

3 RESULTS AND DISCUSSION↩︎

In the previous section, we introduced the detailed framework, including key components such as POD-based modal decomposition and the EnKF for data assimilation. To evaluate the effectiveness and applicability of this method under various wave conditions, we consider three representative conditions in this study: regular waves, irregular waves, and plunging waves. These three wave conditions represent a progression in flow complexity. Concretely, fifth-order Stokes waves are used to represent regular waves. JONSWAP-spectrum waves represent irregular wave conditions. And in the plunging wave case, a cosine wave is made to climb a slope, leading to the formation of a plunging breaker.

In the following sections, we present the reconstruction results for each wave case. We examine the influence of modal truncation, observation conditions (including noise and spatial density), and initial errors on reconstruction accuracy. This analysis provides insights into the performance and limitations of the framework under increasingly realistic and challenging conditions.

3.1 Regular Wave Reconstruction↩︎

Fifth-order Stokes waves are regular nonlinear waves, characterized by a well-defined frequency and wave height. They serve as a suitable starting point to evaluate the reconstruction performance under relatively simple flow conditions.

3.1.1 Grid Independence Verification↩︎

Before analyzing the reconstruction accuracy, it is necessary to verify that the simulation results are not sensitive to grid resolution. This ensures that the conclusions drawn in later sections are physically meaningful and numerically reliable.

The wave tank setup for the regular wave case is shown in Fig. 1, where the length \(L\) is \(10\) m and the water depth \(d\) is \(0.6\) m. Four grids with different resolutions were generated for the fifth-order Stokes case, with the grid resolution denoted as \(m = 10600\), \(24000\), \(54000\), and \(96000\), respectively. The finest grid (\(m = 96000\)) is treated as the reference, and the wave surface computed from coarser grids is compared against it.

The comparison is based on the free surface profile at the initial assimilation time (\(t=8\) s). The RMS error of the wave elevation is used as the evaluation metric, defined as \[\varepsilon_{\mathrm{RMS}} = \sqrt{\frac{1}{M} \sum_{m=1}^{M} \left( \eta_m^{\text{test}} - \eta_m^{\text{ref}} \right)^2 },\] where \(\eta_m^{\text{test}}\) is the free surface elevation at the \(m\)-th point computed from the test grid, \(\eta_m^{\text{ref}}\) is the corresponding value from the reference grid, and \(M\) is the total number of sampling points on the free surface.

The detailed results are summarized in Table 1. As shown in Table 1, the RMS error decreases with grid refinement, confirming that the simulation results are grid-convergent. The medium-resolution grid with \(m = 24000\) achieves a sufficiently low error (0.001161 m) compared to the reference grid, while requiring less computational cost. Therefore, all subsequent simulations in Sec 3.1 adopt the \(m = 24000\) grid to maintain a balance between accuracy and efficiency.

Table 1: Grid convergence analysis for the fifth-order Stokes wave (at \(t=8\) s)
\(m\) 10600 24000 54000 96000 (reference)
\(\varepsilon_{\mathrm{RMS}}\)(m) 0.005785 0.001161 0.001007

3.1.2 Base Assimilation Results↩︎

This section presents the reconstruction results for regular waves under a base configuration, serving as the first demonstration of the proposed framework. To gradually increase the assimilation difficulty, this section considers only a difference in wave height between the baseline and the truth, while keeping other parameters identical. Specifically, the baseline wave height, \(H^b\), is set to 0.10 m, and the true wave height, \(H^t\), is set to 0.15 m. The observation data consist of free surface elevations measured at 11 equally spaced wave gauges, located between 0.5 m and 2.0 m from the inlet (shown in Fig 8), and the observation data are added Gaussian noise with a standard deviation of \(0.0001\,\)m. In this setup, the free surface is represented using the level set method.

In the methodology section, we discussed that applying POD to multi-time flow field datasets from a single case is not suitable for wave problems. In this work, we perform model reduction using data from an ensemble of waves at the same time, making the approach applicable to wave problems. To illustrate the advantages of the present approach, we compare the variance spectra and the modal structures obtained from the fifth-order Stokes wave, as shown in Fig 4 and Fig 5. In each plot(a), blue bars indicate the variance contribution of each mode, and the red line represents the cumulative energy ratio. The present dataset is constructed using 50 wave samples generated under different perturbation parameters. To generate the wave samples, random perturbations are applied to the wave height according to the Eq 1 . The standard deviation of the perturbation is set to \(0.5H^{\mathrm{b}}\), ensuring that the wave ensemble exhibits sufficient statistical variability. The other snapshot matrix is assembled from 50 uniformly sampled time instances during 8 s-10 s of the baseline case. The present method separates spatial features of different wave surfaces, enabling efficient correction of case-to-case discrepancies during assimilation. The first two modes already capture 99% of the energy as shown in Fig 4 (a), with the first representing depth distribution, and the second capturing large-scale variations. In contrast, the conventional method is more suited to periodic flows, where modes correspond closely to harmonic components. This provides strong physical interpretability but requires up to seven modes to reach 99% energy as shown in Fig 5 (a), and the extracted modes cannot be generalized to irregular or non-periodic waves. Consequently, the present approach offers distinct advantages for free surface reconstruction.

a
b

Figure 4: The variance spectrum and modal structure of the dataset that consists of the flow fields from an ensemble of waves at the same time.. a — Variance spectrum of level set., b — Low order modes of level set.

a
b

Figure 5: The variance spectrum and modal structure of the dataset that consists of multi-time flow fields from a single case.. a — Variance spectrum of level set., b — Low order modes of level set.

To illustrate the modal energy distribution of the velocity, we present the variance spectra of \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\) additionally in Fig 6. It is observed that the level set field achieves 99% energy concentration within just the first two modes, while the velocity fields require approximately 30 modes to reach a similar level. The number of retained modes, \(r_{\gamma}\), is truncated to 20 for all three physical fields in this subsection. We also examine the influence of the number of retained modes on the reconstruction accuracy, which will be discussed later.

a
b

Figure 6: Variance spectra and cumulative energy ratio of \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\).. a — \(\boldsymbol{u}_x\), b — \(\boldsymbol{u}_z\)

Next, we assess the assimilation performance of the state vectors. In this case, the first 20 samples are selected as the ensemble members to be assimilated. Each sample is projected onto the modal space to obtain a state vector \(\boldsymbol{x}^{(n),f}\). The modal coefficients for \(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\) are plotted separately in Fig 7, comparing the forecast, baseline , DA, and truth states. In these plots, the \(x\)-axis represents the mode index, and the \(y\)-axis shows the corresponding modal coefficient. The results demonstrate that the assimilation of the level set field achieves high accuracy across all modes, while for \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\), accurate reconstruction is mainly limited to the first one or two low-order modes.

a
b
c

Figure 7: Comparison of modal coefficients: \(\boldsymbol{x}^f\),\(\boldsymbol{x}^b\), \(\boldsymbol{x}^a\), and \(\boldsymbol{x}^t\).. a — \(\boldsymbol{a}_{\phi}\), b — \(\boldsymbol{a}_{u_x}\), c — \(\boldsymbol{a}_{u_z}\)

To visualize the physical-space reconstruction of the wave surface, we compare the free surface profiles before and after assimilation at the current assimilation time in Fig 8. The \(x\)-axis denotes the position along the wave tank, and the \(y\)-axis represents the wave elevation. The tank is divided into three regions along the \(x\)-direction in accord with the sketch Fig 1, separated by gray dashed lines: the wave generation and observation region at the front, the test section in the middle, and the damping zone at the rear is omitted. The most critical test section is located between 5 and 10 m. The figure includes the forecast ensemble, baseline, DA, and truth profiles, with observation locations marked by cross symbols. The results show that the assimilated wave surface closely matches the ground truth, indicating high reconstruction accuracy.

Figure 8: Reconstructed wave profile at the current assimilation time ( denotes the observation points).

Finally, we extract the velocity fields \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\) in the observation and test section at the assimilation time shown in Fig 9, and compare the assimilated results with the truth velocity using contour maps. Although the assimilation accuracy of velocity fields is high only for low-order modes, the overall reconstruction quality remains satisfactory. The velocity discrepancies are mostly confined to localized regions of \(\boldsymbol{u}_x\) and occur away from the free surface.

a
b

Figure 9: Comparison of reconstructed and true velocity fields at the current assimilation time.. a — \(\boldsymbol{u}_x\) comparison, b — \(\boldsymbol{u}_z\) comparison

The reconstruction framework supports sequential assimilation. After the first assimilation is completed at \(t = 8\,\mathrm{s}\), the assimilated wave fields \(\boldsymbol{\gamma}^a\) are used as the new initial condition to propagate the wave forward in time. Since this study focuses solely on initial condition assimilation and does not modify boundary conditions, the wave maker at the inlet continues to operate under the baseline setting. As a result, due to the uncorrected boundary input and the residual errors from the previous assimilation, deviations reappear as the target wave propagates into the test section. In such cases, the procedure can be applied again for further assimilation at later times.

As described in the methodology, the timing of the next assimilation is determined by the error \(\eta\), as defined in Eq 4 . In this study, the threshold is set to 0.015 m. When the reconstructed flow field from the first assimilation advances to \(t=10\) s, \(\eta\) exceeds the threshold, and the second assimilation is performed. To better illustrate the evolution of the forecast field, the free surface variation between \(8\) s and \(10\) s is presented in Fig 10.

a
b
c

Figure 10: Comparison of reconstructed and true wave profiles between two assimilation instants.. a — \(t=8.5\) s, b — \(t=9\) s, c — \(t=9.5\) s

The free surface result after the second assimilation at \(t = 10\,\mathrm{s}\) is presented in Fig 11. In the figure, the green dashed line represents the free surface shape obtained by propagating the assimilated wave field from \(t = 8\,\mathrm{s}\) to the current time without further assimilation (denoted as “last DA forecast"), while the red dotted-dashed line corresponds to the result after the second assimilation at \(t = 10\,\mathrm{s}\) (denoted as”DA"). It is evident that the first assimilation result accumulates visible errors during propagation, especially in the wave generation region, with slight deviations also appearing in the test zone. These inaccuracies may negatively affect subsequent analysis, highlighting the necessity of sequential assimilation to maintain reconstruction accuracy over time.

Figure 11: Reconstructed wave profile at the next assimilation time (t=10 s).

3.1.3 Comparison of Interface Representation Methods↩︎

As introduced in the methodology section, three different representations of the gas–liquid interface are considered: VOF, level set, and curve tracking. To evaluate their applicability and accuracy in wave reconstruction, this subsection presents a comparative analysis.

For the VOF and level set fields, SVD is applied to reduce dimensionality. The free surface is reconstructed using different numbers of retained modes. In contrast, the curve-based representation does not involve SVD decomposition; instead, it uses spline interpolation with a varying number of nodes to express the free surface. The reconstruction error is computed against the true free surface using the RMS error, defined as \[\varepsilon_{\eta} = \sqrt{\frac{1}{M} \sum_{m=1}^{M} \left( \eta_m - \eta_m^{t} \right)^2 },\] where \(\eta_m\) is the reconstructed elevation at the \(m\)-th point, \(\eta_m^{t}\) is the corresponding value at the true interface, and \(M\) is a value large enough to reflect the entire wave shape.

To complement the analysis, the modal energy distribution of the VOF field is also provided, as shown in Fig 12. Compared with the level set method, the VOF method requires more modes to represent the flow field, as the VOF interface exhibits stronger discontinuities. Fig [fig:stokes32V32interface95error] compares the wave elevation reconstruction errors of three interface representation methods. In the comparative results, all three methods are evaluated in a unified way: the \(x\)-axis denotes the number of modes or nodes used in the representation, and the \(y\)-axis shows the wave elevation error measured by RMS. Three curves corresponding to the three methods illustrate the trend of reconstruction accuracy and convergence efficiency. The level set method achieves the highest accuracy. Even with only 5 retained modes, the RMS error falls below \(0.001\,\mathrm{m}\), and further increasing the number of modes yields minimal change. This suggests that the level set representation is compact and efficient for capturing the wave interface. For the VOF method, the reconstruction error exhibits a slight increase as the number of modes increases, before eventually converging. This behavior may be attributed to noise introduced by the higher-order modes of the VOF representation. Moreover, its overall error remains higher than that of the level set. This implies that its interface features are more dispersed and require more modes to converge. The curve-based representation suffers from significantly larger errors at low interpolation node counts (e.g., error is greater than \(0.004\,\mathrm{m}\) when the nodes is fewer than 20). Only when the number of spline nodes reaches 50 does it achieve comparable accuracy to the level set method, indicating limited efficiency in representing wave profiles. Overall, the level set method demonstrates superior accuracy and modal convergence efficiency in this test case, making it a strong candidate for interface representation in the DA framework.

Figure 12: Modal energy spectrum and cumulative variance of the VOF field.

3.1.4 Sensitivity to Observation Noise↩︎

In the data assimilation process, observation errors significantly affect the accuracy of wave reconstruction, thus it is necessary to systematically evaluate the assimilation performance under different observation noise levels. In this section, four sets of Gaussian observation noise with standard deviations of 0.0001 m, 0.001 m, 0.01 m, and 0.1 m are applied. The observation point locations and assimilation settings remain consistent with those in the base configuration of Sec 3.1.2, with only the observation noise levels varied.

To quantify the impact of noise on reconstruction performance, the RMS of the elevation and the velocity components \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\) are calculated. The results are shown in Fig 13. The different colors and line styles in the figure represent the errors of three physical quantities. The RMS error of velocity components is calculated as \[\varepsilon_{u} = \sqrt{\frac{1}{m} \sum_{i=1}^{m} \left( u_i - u_i^{t} \right)^2 },\] where \(u_i\) is the reconstructed velocity component at the \(i\)-th cell, the \(u_i^t\) is the corresponding value of the true case.

The results show that the overall error of the wave elevation is relatively small, indicating that the assimilation algorithm performs well in reconstructing the free surface elevation. In contrast, the velocity errors-especially for the horizontal velocity \(\boldsymbol{u}_x\)-are larger, reaching up to 0.08 m/s, which reflects the greater difficulty in reconstructing the velocity field. When the observation noise reaches 0.1 m, the errors of all three physical quantities increase rapidly. This is mainly due to the high noise causing a sharp decrease in the signal-to-noise ratio of the observations, making it difficult for the assimilation process to effectively correct the model state, resulting in the analysis state deviating from the true state. Furthermore, high observation noise may lead to convergence issues in the assimilation filter, causing instability in the state estimation and thus amplifying the error growth.

Figure 13: Comparison of reconstruction errors for different observation noise levels.

3.1.5 Analysis of Baseline and Truth Discrepancies↩︎

This section aims to evaluate the applicability of the proposed method under different conditions by testing its sensitivity to original discrepancies, i.e., to what extent the method can reconstruct waves with varying original deviations. The original error is defined as the difference in wave height between the truth and the baseline, given by \(\delta^H = H^t - H^b\), where \(H^t\) and \(H^b\) are the wave heights of the truth and the baseline, respectively. Four cases are tested with original errors of 0.025 m, 0.05 m, 0.075 m, and 0.1 m, achieved by varying the wave height in the truth setting while keeping all other assimilation parameters unchanged.

Fig [fig:stokes32V32org32error] presents the reconstruction errors under different original discrepancies. As the original error increases, both the wave elevation and velocity errors show an overall upward trend. Notably, the error grows more significantly when the original discrepancy increases from 0.05 m to 0.1 m. It can be seen that with 50 wave samples as the dataset and a wave parameter variance set to \(0.5H^b\), the reconstruction can be satisfactorily achieved when the original error is below \(0.5\) m. For larger original errors, the range of wave samples needs to be expanded. Besides, among the three physical quantities, the horizontal velocity \(\boldsymbol{u}_x\) still exhibits the largest reconstruction error.

3.2 Irregular Wave Reconstruction↩︎

JONSWAP spectral waves are representative irregular waves characterized by key parameters such as significant wave height and peak period. Compared to the regular waves, JONSWAP waves exhibit stronger randomness and more complex frequency content, which increases the difficulty of wave reconstruction.

3.2.1 Grid Independence Verification↩︎

This section conducts a grid independence verification. In addition, the wave tank setup for the irregular wave case also follows Fig. 1, with the length \(L\) is \(10\) m and the water depth is \(0.6\) m. Four mesh configurations with different resolutions are tested, denoted by \(m\), the number of grid cells. From coarse to fine, the values are \(m=13200, 30000, 53600\), and \(120000\). The finest grid (\(m=120000\)) is used as the reference mesh for error comparison.

The verification is performed using the free surface shape extracted at the initial assimilation time step (\(t=8\) s). The RMS error of the wave profile is used as the evaluation metric. The detailed results are summarized in Table 2. As shown in Table 2, the RMS error of the free surface decreases with increasing grid resolution, confirming the convergence of the simulation results with respect to mesh refinement. Considering both error levels and computational efficiency, the grid with \(m=30000\) is ultimately selected for the subsequent simulations.

Table 2: Grid convergence analysis for the JONSWAP wave (at \(t=8\) s)
\(m\) 13200 30000 53600 120000 (reference)
\(\varepsilon_{\mathrm{RMS}}\)(m) 0.000858 0.000361 0.000165

3.2.2 Base Assimilation Results↩︎

This section presents the data assimilation results of the JONSWAP wave under a base configuration, serving as a reference for evaluating the performance of the framework. In this section, discrepancies between the baseline and the truth are introduced in both the significant wave height and the peak period. Specifically, the baseline is set to a significant wave height of 0.05 m and a peak period of 2 s, while the truth is defined with a significant wave height of 0.15 m and a peak period of 2.5 s. The observation data consist of wave elevation measurements at 21 uniformly spaced points between 0.5 m and 3.5 m from the tank inlet, and the observation data are added Gaussian noise with a standard deviation of \(0.0001\) m. The decomposed matrix is constructed using flow field data from 50 different JONSWAP wave samples at a fixed time. For sample generation, random perturbations are applied to the wave height and peak period using the formula with a standard deviation of \(0.5H^b\) and \(0.5T^b\), to construct a statistically representative ensemble of wave conditions. The free surface is represented by the level set, \(\boldsymbol{\phi}\).

Energy spectra for the three physical fields, \(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\), are shown in Fig 14 to illustrate their modal characteristics. In the case of irregular waves, the level set field requires up to 6 modes to capture 99% of the energy, indicating increased complexity compared to the regular wave case. Similarly, the velocity components demand more modes, up to 40. Although a larger number of modes is required to capture the irregular wave field, the first 20 modes are employed here to maintain consistency across cases, similar to Sec 3.1.2.

a
b
c

Figure 14: Variance spectra and cumulative energy ratio of \(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\).. a — \(\boldsymbol{\phi}\), b — \(\boldsymbol{u}_x\), c — \(\boldsymbol{u}_z\)

Similarly, the first 20 samples are used as the assimilation ensemble. Fig 15 shows the assimilation results of the state vectors. The first five modes of the level set field are well assimilated, while higher-order modes exhibit error fluctuations. The influence of these fluctuations on the reconstructed free surface will be discussed in Fig 16. For the velocity components \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\), accurate reconstruction is mainly observed in the first two or three modes.

a
b
c

Figure 15: Comparison of modal coefficients: \(\boldsymbol{x}^f\),\(\boldsymbol{x}^b\), \(\boldsymbol{x}^a\), and \(\boldsymbol{x}^t\).. a — \(\boldsymbol{a}_{\phi}\), b — \(\boldsymbol{a}_{u_x}\), c — \(\boldsymbol{a}_{u_z}\)

The reconstructed free surface profiles before and after assimilation are further compared in physical space, shown in Fig 16. As in previous sections, the spatial domain is divided into the wave generation and observation region (front), the test section (middle, located between 5 and 10 m), and the damping zone (rear, which is omitted in the figure). It can be observed that accurately assimilating the first five modal coefficients of the level set field enables a high-fidelity reconstruction of the free surface, though small local discrepancies remain (e.g., at \(x=6\) m).

Figure 16: Reconstructed wave profile at the current assimilation time ( denotes the observation points).

Finally, the velocity fields at the current time step are reconstructed. The reconstructed \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\) in the observation and test section are compared against the truth in the form of contour plots, Fig 17. Although only low-order modal coefficients are accurately assimilated, the reconstructed velocity fields do not exhibit significant visual discrepancies, indicating acceptable reconstruction quality in the velocity field.

a
b

Figure 17: Comparison of reconstructed and true velocity fields at the current assimilation time.. a — \(\boldsymbol{u}_x\) comparison, b — \(\boldsymbol{u}_z\) comparison

To further evaluate the applicability of the sequential assimilation strategy to irregular waves, a second assimilation is performed at \(t = 10.5\,\mathrm{s}\) when the \(\eta>0.6\) m. As same as the regular wave case, we present the process of the wave propagation as shown in Fig 18.

a
b
c
d

Figure 18: Comparison of reconstructed and true wave profiles between two assimilation instants.. a — \(t=8.5\) s, b — \(t=9\) s, c — \(t=9.5\) s, d — \(t=10\) s

Fig 19 presents the reconstructed free surface at \(t = 10.5\,\mathrm{s}\). As shown in the figure, the flow field obtained from the first assimilation exhibits noticeable deviations from the true free surface after being propagated forward into the test section (indicated by the green dashed line). In contrast, after the second assimilation, the reconstructed free surface (indicated by the red dotted-dashed line) aligns well with the ground truth across most of the test section, indicating that the assimilation at this time step effectively corrects the reconstructed state. Within the test section, noticeable errors are observed near the wave crests. This may be attributed to the complex and highly variable nature of irregular wave fields that cannot be fully captured by a limited number of modes. In addition, since no observations are included within this error-prone region, the wave surface errors cannot be corrected.

Figure 19: Reconstructed wave profile at the next assimilation time.

3.2.3 Comparison of Interface Representation Methods↩︎

This section further compares the performance of three free surface representation methods—level set, VOF, and curve tracking. Fig 20 supplements the analysis by showing the energy distribution of VOF modes.

Figure 20: Modal energy spectrum and cumulative variance of the VOF field.

For error comparison, the RMS errors of the reconstructed wave elevation using the three methods are plotted in a unified error figure (Fig [fig:jonswap32332methods]). The \(x\)-axis denotes the modal truncation number (or the number of points used in the curve representation), and the \(y\)-axis shows the RMS error of reconstructed wave elevation. The three curves reflect the accuracy trends. All other settings are consistent with those described in Sec 3.2.2.

In terms of overall accuracy, the level set method yields the lowest error, remaining below \(0.004\) m, while the VOF method shows the highest error, ranging between \(0.008\) m and \(0.016\) m. Regarding the trend, the error of the level set method slightly increases as more modes are included. This is likely due to the fact that the level set can be well represented by the first six modes, and adding higher-order modes introduces noise or non-physical disturbances, degrading the reconstruction quality. In contrast, the energy of the VOF field is more dispersed, requiring approximately 14 modes to reach the 99% energy coverage. As a result, the reconstruction error decreases rapidly with increasing modal truncation number. So we adopt the level set method to represent the interface in subsequent subsections.

3.2.4 Sensitivity to Observation Noise↩︎

This section evaluates the reconstruction accuracy of the three physical fields—\(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\)—under different levels of observation noise. The experimental setup is similar to that in Sec 3.1.4, with the observation noise levels set to four magnitudes: \(0.0001\),m, \(0.001\),m, \(0.01\),m, and \(0.1\),m. The corresponding results are shown in Fig 21.

As shown in the figure, the reconstruction errors of all three physical quantities exhibit a non-monotonic trend: the errors first decrease and then increase with increasing observation noise, reaching a minimum at a noise level of \(0.001\) m. This phenomenon may be attributed to a regularization effect introduced by moderate noise: when the observation noise is very small, the system may overfit local fluctuations in the measurements; whereas a moderate noise level leads to more stable filter updates. When the noise becomes too large, however, the signal-to-noise ratio degrades, reducing the effectiveness of assimilation.

Although the smallest reconstruction errors are achieved at \(0.001\) m noise, a uniform noise level of \(0.0001\) m is used in subsequent experiments to ensure consistency across all test cases.

Figure 21: Comparison of reconstruction errors for different observation noise levels.

3.2.5 Sensitivity to Observation Density↩︎

For irregular waves, the spatial density of observation points may have a more significant impact on reconstruction accuracy compared to regular waves. Regular waves exhibit clear periodicity and spatial structure, allowing their main modal features to be captured even with sparse observations. In contrast, irregular waves lack such regularity, and their local structures and spectral content are more complex, making dense spatial observations more crucial for accurate flow field reconstruction.

Motivated by this, we introduce this dedicated subsection. In the previous experiments (from Sec 3.2.2 to Sec 3.2.5), observations were evenly distributed over a 3 m region with 21 points, corresponding to a spatial spacing of 0.15 m. In this section, we test three spacing configurations over 3 m: 0.15 m, 0.2 m, and 0.3 m. The corresponding results are shown in Fig [fig:jonswap32obs32density].

As the observation spacing increases, the RMS errors of the three physical quantities—\(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\)—increase accordingly. Among them, the increase in free surface error is relatively minor, while the velocity fields, especially \(\boldsymbol{u}_x\), are more sensitive to observation density. These results indicate that in the case of irregular waves, increasing the density of spatial observations helps improve overall reconstruction accuracy.

3.2.6 Analysis of Baseline and Truth Discrepancies↩︎

To evaluate the applicability of the reconstruction method under varying initial conditions, this section introduces different levels of baseline-truth discrepancy by modifying the original wave height of the truth. In the setup, the baseline wave height is fixed at \(0.05\) m, while the truth wave height is varied to produce initial discrepancies \(\delta^H = 0.0125\) m, \(0.025\) m, \(0.0375\) m, and \(0.05\) m.

The assimilation errors are presented in Fig 22. As the initial discrepancy increases, the reconstruction errors of all three physical quantities show an upward trend. The error growth of the \(u_x\) and the level set is relatively uniform, whereas the error in the \(u_z\) increases at a faster rate when the original error exceeds \(0.0375\) m. This suggests that as the original state deviates significantly from the observations, the assimilation system’s ability to effectively correct the state vector becomes limited. In addition, the horizontal velocity \(\boldsymbol{u}_x\) consistently shows the highest error among the three fields, indicating its reconstruction remains the most challenging under irregular wave conditions, which are more difficult to capture with limited observations and truncated modal representations.

Figure 22: Comparison of reconstruction errors for different original discrepancies.

3.3 Plunging Wave Reconstruction↩︎

This section investigates the plunging breaking process of cosine-type waves climbing up a slope with a slope ratio of \(1{:}35\), which represents a strongly nonlinear wave phenomenon. The schematic setup is shown in Fig 23. The slope starts 4 m from the inlet, with \(L\) of 15 m and \(d\) of 0.4 m. As the wave propagates toward the shore, the slope causes the wave crest to steepen progressively, eventually leading to instability and overturning. This process involves intense free surface deformation and sharp variations in the local velocity field, characterized by strong nonlinearity, local transience, and complex structures. This configuration not only challenges the numerical model’s ability to capture strongly deformed interfaces, but also provides a benchmark scenario to evaluate the applicability of the data assimilation method under extreme hydrodynamic conditions.

Figure 23: Plunging wave numerical tank setup.

It should be noted that the assimilation is not performed at the most intense moment of wave breaking, but rather at a mid-slope region where the wave has not yet broken but already exhibits a pronounced steepening trend. In typical plunging breaker experiments, the test section is designed to encompass the full process of shoaling and plunging. Therefore, in this study, assimilation is applied before the plunging event. The reconstructed flow field is then advanced forward and compared with the true wave propagation within the test section, which includes the shoaling and plunging regions. This allows for a thorough evaluation of the reconstruction method’s applicability and accuracy in capturing the evolution of strongly nonlinear wave processes.

3.3.1 Grid Independence Verification↩︎

Grid independence verification is conducted for the plunging wave case. Four mesh resolutions are tested, corresponding to \(m = 121600\), \(199100\), \(311200\), and \(486000\). The finest mesh (\(m = 486000\)) is treated as the reference solution for error comparison.

The comparison is performed based on the free surface profile at \(t = 15\) s, using the RMS error of the interface shape as the metric. The results are summarized in Table 3.

As shown in the table, the RMS error decreases with increasing grid resolution, confirming the convergence of the numerical solution. Considering the trade-off between accuracy and computational cost, the mesh with \(m = 199100\) is selected for all subsequent simulations in this section.

Table 3: Grid convergence analysis for the plunging wave (at \(t=15\) s)
\(m\) 121600 199100 311200 486000 (reference)
\(\varepsilon_{\mathrm{RMS}}\)(m) 0.0476 0.00233 0.00154

3.3.2 Base Assimilation Results↩︎

This section presents the assimilation results of the plunging cosine wave under the base configuration, aiming to evaluate the effectiveness of the method in reconstructing strongly nonlinear wave processes. In this section, the baseline wave is configured with a wave height of \(0.12\) m and a period of \(5\) s. To ensure that the true wave does not plunge before entering the test section, its wave height is reduced to \(0.09\) m. It is worth noting that cosine-type waves with smaller amplitudes typically propagate more slowly during shoaling compared to larger waves, which helps avoid premature breaking. In addition, the observation approach is modified. Instead of discrete elevations, a rectangular window within the observation section is selected as shown in Fig 23, and the free surface data in this window are used as observations. Gaussian noise with a standard deviation of \(0.0001\) m is added to mimic measurement uncertainty. A total of \(50\) wave samples are generated for the SVD decomposition, where each sample’s wave height is perturbed using Eq 1 with a standard deviation of \(0.5H^b\). The free surface is represented using the level set field \(\boldsymbol{\phi}\).

The modal energy distributions of the three physical fields at the assimilation time are shown in Fig 24. For the level set \(\boldsymbol{\phi}\), the energy is primarily concentrated in the first 22 modes, which together account for 99% of the total energy. In contrast, the velocity exhibits a more dispersed energy distribution, requiring approximately 47 modes to cover the dominant energy. Compared to fifth-order Stokes waves and JONSWAP waves, the representation of both the free surface and velocity fields is more challenging for plunging waves. Although more modes are required to accurately represent the flow field under the plunging wave condition, we still need to ensure consistency of the experimental setup. Therefore, in the analysis, a 20-mode truncation is temporarily adopted for all three physical fields.

a
b
c

Figure 24: Variance spectra and cumulative energy ratio of \(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\).. a — \(\boldsymbol{\phi}\), b — \(\boldsymbol{u}_x\), c — \(\boldsymbol{u}_z\)

And the first 20 wave samples are used to generate the forecast state vector for assimilation. Fig 25 shows the modal coefficient comparison for the three physical fields at the assimilation time. It can be seen that the modal coefficients of the level set after reconstruction exhibit errors, but the variation trends of the low-order and high-order modal coefficients are reasonably well captured, except for spurious oscillations occurring at the 10–13th modes. In contrast, the reconstructed coefficients of the velocity components maintain a certain level of accuracy only in the low-order to middle-order modes, while significant fluctuations appear in the high-order modes.

a
b
c

Figure 25: Comparison of modal coefficients: \(\boldsymbol{x}^f\),\(\boldsymbol{x}^b\), \(\boldsymbol{x}^a\), and \(\boldsymbol{x}^t\).. a — \(\boldsymbol{a}_{\phi}\), b — \(\boldsymbol{a}_{u_x}\), c — \(\boldsymbol{a}_{u_z}\)

The reconstructed free surface profiles before and after assimilation are further compared in the physical domain, as shown in Fig 26. The wave tank is divided into three regions: the front wave generation and observation region, the central test section, and the rear wave absorption region (omitted). The test section starts from 11  and ends at 15 m. The observation window is highlighted with a black rectangular box in the figure. After assimilation, the reconstructed wave surface shows good agreement with the true wave profile before entering the test section. Overall, the reconstructed free surface accurately captures the evolution trend of the plunging wave.

Figure 26: Reconstructed wave profile at assimilation time.

The reconstruction results of the velocity field at the assimilation time are further examined. Fig 27 compares the reconstructed velocity fields \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\) with the true velocity components using contour plots. Despite relatively larger assimilation errors in the higher-order modes, the overall flow structures are approximately captured. The main flow regions and their gradient distributions match those of the true fields. The reconstructed velocity components exhibit streak-like fluctuations near the wave crest. This phenomenon may be attributed to the fact that our SVD dataset consists of multiple wave samples, which introduces mixed modal features and consequently leads to streak-like patterns in the modes. Additionally, the strong local velocity gradients around the crest region may further amplify such streak structures.

a
b

Figure 27: Comparison of reconstructed and true velocity fields at assimilation time.. a — \(\boldsymbol{u}_x\) comparison, b — \(\boldsymbol{u}_z\) comparison

To investigate whether the velocity reconstruction errors affect the wave propagation, and to verify the accuracy of this reconstruction method in predicting the wave evolution, the reconstructed flow field is advanced in time after the initial assimilation. The predicted free surface profiles at several future time instances are compared with the true wave field. Fig 28 presents the comparisons at \(t=16.0\) s, \(16.5\) s, and \(17.0\) s.

The results show that the reconstructed wave surface remains generally consistent with the true profile throughout the prediction period, with well-matched evolution trends. As the wave propagates up the slope and approaches the plunging stage, the discrepancy gradually increases, mainly near the wave crest. This may be attributed to reconstruction errors in the velocity field. Nevertheless, the predicted wave surface accurately captures the overall propagation characteristics, demonstrating the method’s forward prediction capability in strongly nonlinear wave regimes.

a
b
c

Figure 28: Comparison of reconstructed and true wave profiles during plunging.. a — \(t=16\) s, b — \(t=16.5\) s, c — \(t=17\) s

It should be noted that, unlike the preceding cases of regular and irregular waves, this section does not involve multiple sequential assimilation steps. This is because the plunging wave induced by the cosine-type input is not a continuous wave train. Each wave shoals and breaks individually, and these events are relatively isolated from each other. Therefore, the present study focuses on the reconstruction and forward prediction of a single representative wave prior to breaking. In fact, the evolution of subsequent plunging waves exhibits dynamics similar to the assimilated case, and is thus not further discussed.

3.3.3 Comparison of Interface Representation Methods↩︎

This section compares the performance of two interface representation methods—level set and VOF—for the plunging wave case. Due to the highly nonlinear evolution and significant free surface deformation during wave plunging, the curve-based representation becomes unsuitable and is thus excluded from this comparison. All other assimilation settings remain consistent with those described in Sec 3.3.2, including wave parameters, observation layout, etc. The evaluation focuses on the reconstruction accuracy of the free surface under each representation method, aiming to assess their suitability and limitations in strongly nonlinear wave conditions.

The energy distribution of the VOF modes in the plunging wave case is provided in Fig 29. It shows that more than 44 modes are required to capture 99% energy, which is nearly twice the number needed for the level set. This indicates that the VOF representation has a more dispersed energy structure.

Furthermore, we compare the reconstruction accuracy of both interface representation methods under different modal truncation levels, as shown in Fig [fig:plunging32332methods]. The results reveal that both methods achieve the minimum reconstruction error when using 20 modes. From the energy spectrum, it can be seen that the level set requires about 20 modes to capture 99% of the total energy. At this stage, the non-physical noise introduced by the modes has little influence on the interface reconstruction, leading to the minimum error. In contrast, the VOF field requires about 45 modes to reach 99% of the energy, while 20 modes account for only 90%. However, since high-order modes contain noise that deteriorates reconstruction accuracy, the minimum error is also achieved when the number of truncated modes is set to 20. When more modes are included, the error begins to rise, suggesting that excessive high-order modes may introduce nonphysical noise and local oscillations, thereby degrading the overall reconstruction accuracy of the interface.

Overall, the level set method demonstrates higher reconstruction accuracy across most modal truncation levels. However, when using 10 modes, the VOF method yields a lower error. This is mainly attributed to the differences between the two interface representations. The VOF field exhibits strong discontinuities near the interface, and low-order modal truncation can effectively filter out high-frequency noise, resulting in a smoother interface extracted by the \(0.5\)-isoline and thus reducing the error. In contrast, the level set field is inherently smoother, but low-order modes are often insufficient to capture the steep gradients at the interface, while a large portion of the energy is distributed in regions far from the interface. Therefore, the VOF field achieves its minimum error at a truncation of 10 modes, whereas the level set requires more modes.

Figure 29: Modal energy spectrum and cumulative variance of the VOF field.

3.3.4 Sensitivity to Observation Noise↩︎

This section investigates the impact of different levels of observation noise on the reconstruction accuracy. In this section, the observation data consist of the wave surface information within the observation window, namely the level set or VOF values of the grids inside the window. Accordingly, the prescribed level of noise is added to each grid data point. Specifically, Gaussian noise with standard deviations of \(0.0001\), \(0.001\), \(0.01\), and \(0.1\) is introduced to evaluate the sensitivity of the three physical fields (\(\boldsymbol{\phi}\), \(\boldsymbol{u}_x\), and \(\boldsymbol{u}_z\)). The results are shown in Fig 30.

It is observed that the reconstruction error for the free surface reaches its minimum when the observation noise is \(0.001\), and increases notably when the noise level reaches \(0.1\). In contrast, the velocity errors behave differently: the errors in \(\boldsymbol{u}_x\) and \(\boldsymbol{u}_z\) are highest when the noise is \(0.0001\), then decrease significantly and reach their minimum around \(0.001\), followed by a slow increase as noise continues to rise. This trend may stem from the fact that overly “clean” observations (with very low noise) might lead the assimilation system to overfit the observation data, introducing non-physical high-frequency components into the velocity fields.

Figure 30: Comparison of reconstruction errors for different observation noise levels.

3.3.5 Analysis of Baseline and Truth Discrepancies↩︎

This subsection examines how discrepancies between the baseline and truth wave heights affect reconstruction accuracy. All settings remain the same as before, except that the height of the truth wave is reduced relative to the baseline height of \(0.12\) m to ensure that the wave does not roll before entering the test section. Therefore, the original errors \(\delta^H\) are all negative, with values of \(-0.03\) m, \(-0.06\) m, and \(-0.09\) m.

Fig [fig:plunging32org32error] presents the reconstruction errors for the three physical quantities under different levels of original discrepancies. The errors first increase and then decrease as the original error becomes less negative, reaching a maximum at \(\delta_H=-0.06\) m. This non-monotonic trend can be explained as follows: When the original error is \(-0.09\) m, the truth wave has a height of only \(0.03\) m and remains in the pre-breaking stage with mild evolution, making the flow field relatively simple to reconstruct despite the large original discrepancy. In contrast, at \(\delta^H=-0.03\) m, the truth wave is closer to the baseline, allowing the SVD modal basis to represent its dominant structures more accurately, leading to lower reconstruction errors.

4 Conclusion↩︎

In this study, an EnKF–FVM coupled framework is developed for wave reconstruction, using limited interface observations. The applicability and accuracy of the framework are systematically assessed under various scenarios. For regular waves, irregular waves, and plunging waves with different levels of nonlinearity, the proposed framework can significantly reduce the bias of the baseline solution and accurately reconstruct both the current wave states and their propagation processes. This verifies the applicability and effectiveness of the framework.

By introducing the POD approach and employing a dataset construction different from conventional practices, effective modes representing the free surface are successfully extracted. This demonstrates that the present construction achieves higher efficiency in the interface problem, compared to the conventional approach. A comparative study of three interface representation methods—level set, VOF, and curve-tracking—shows that the level set method yields the highest reconstruction accuracy. This is mainly attributed to its continuity in interface description and flexibility in geometric representation, which allows for a more precise capture of wave surface evolution. The observation noise, the number of retained modes, etc, significantly affect the reconstruction accuracy, highlighting the necessity of careful parameter selection and optimization in practical applications.

However, the current framework encounters difficulties in directly reconstructing transients with strong nonlinearity such as wave breaking and wave slamming. Addressing this limitation is of great significance for the future realization of a fully functional digital twin wave tank. Furthermore, because the ensemble of baseline simulations is not updated over time, significant deviations from the true state can degrade the DA results, particularly for irregular waves. Consequently, this strategy is not currently suited for reconstructing the wave field over quite long time periods. In principle, the baseline could be updated using the reconstructed fields via a resampling method. However, it is challenging to apply perturbations that both generate a new ensemble and consistently satisfy the physical constraints of the wave field, particularly for the free surface at any given time. Incorporating updates into the ensemble without causing ensemble collapse is an important direction to improve this framework if EnKF is used.

Acknowledgments↩︎

This work is supported by the Research Project of China COSCO Shipping Corporation Limited (2023-2-Z001-03), Open Project of the State Key Laboratory of Ocean Engineering, and the Fundamental Research Funds for the Central Universities.

5 Observation Operator↩︎

In this study, different representations of the free surface are employed, associated with different observation operators. Moreover, for the reconstruction of plunging waves, the observations need to effectively capture the nonlinear variations of the wave profile, making discrete wave elevations no longer suitable. Instead, the free surface distribution within a rectangular observation window is used as the observational data. Consequently, the observation operator for the plunging wave case is defined differently, and the details are provided in this section.

First, we derive the observation operators corresponding to the different free surface representations when adopting free surface elevations as observations. The forecast state vector for ensemble member \(n\) is represented as Eq 2 where \(\boldsymbol{a}_{\chi}^{(n),f}\in\mathbb{R}^{r_{\chi}}\) is the truncated modal coefficient vector for the interface related field (VOF or level-set), or is represented as Eq 5 (curve representation). When adopting Eq 2 , the modal-to-physical reconstructions are \[\boldsymbol{\gamma_{\chi}}^{(n),f} = \widehat{\boldsymbol{U}}_{\chi}\,\boldsymbol{a}_{\chi}^{(n),f}.\]

Note the full-observation operator acting on the full state can be written in block form as \[\mathcal{H}(\boldsymbol{x}^f) = {\begin{bmatrix} \mathcal{H}_\chi(\cdot) & \boldsymbol{0} & \boldsymbol{0} \end{bmatrix}} \begin{bmatrix} \boldsymbol{a}_\chi\\ \boldsymbol{a}_{u_x}\\ \boldsymbol{a}_{u_z} \end{bmatrix},\] i.e. only the \(\chi\)-block contributes directly to wave elevation observations. For the present problem, the observation operator is nonlinear because extracting geometric free surface elevations from an implicit field involves nonlinear steps. We define \[\mathcal{H}_\chi(\boldsymbol{a}_\chi) = \boldsymbol{S}\,\mathcal{F}\!\left(\widehat{\boldsymbol{U}}_{\chi} \,\boldsymbol{a}_\chi \right),\] where \(\boldsymbol{S}\) belongs to \(\mathbb{R}^{j\times g}\) is the interpolation operator that extracts elevation values at the observation locations; \(\mathcal{F}:\mathbb{R}^{m}\to\mathbb{R}^g\) is an operator that converts the implicit interface field to a set of nodes representing the geometric free surface elevation in the entire region, where \(g\) is the number of nodes.

When adopting the parametric cubic spline representation in Eq 5 , the free surface is expressed as \[f(x) = \sum_{i=1}^{r_c} h_i\,\varphi_i(x),\] where \(\{\varphi_i(x)\}_{i=1}^{r_c}\) are the spline basis functions associated with the fixed abscissae \(\{x_i^{c}\}_{i=1}^{r_c}\), and \(h_i\) are the components collected in \(\boldsymbol{h}\). Let \(\{x^{\mathrm{obs}}_k\}_{k=1}^j\) be the observation locations in the \(x\)-direction. Evaluating the spline at these locations yields \[\mathcal{H}(\boldsymbol{x}^f) = \underbrace{\begin{bmatrix} \varphi_1(x^{\mathrm{obs}}_1) & \cdots & \varphi_{r_c}(x^{\mathrm{obs}}_1) \\ \vdots & \ddots & \vdots \\ \varphi_1(x^{\mathrm{obs}}_j) & \cdots & \varphi_{r_c}(x^{\mathrm{obs}}_j) \end{bmatrix}}_{:=\,\boldsymbol{L} \in \mathbb{R}^{j\times r_c}} \boldsymbol{h}.\]

In block form, the observation operator acting on the full state vector is \[\mathcal{H}(\boldsymbol{x}^f) = \begin{bmatrix} \boldsymbol{L} & \boldsymbol{0} & \boldsymbol{0} \end{bmatrix} \begin{bmatrix} \boldsymbol{h} \\[2pt] \boldsymbol{a}_{u_x} \\[2pt] \boldsymbol{a}_{u_z} \end{bmatrix},\] indicating that only the \(\boldsymbol{h}\)-block contributes directly to wave elevation observations.

When the framework operates in plunging wave conditions, the observations are presented as the free surface distribution within a window, which is mathematically represented by the VOF data within that window. When the free surface is represented by the VOF method, the observation operator is linear and can be expressed as \[\mathcal{H}(\boldsymbol{x}^f) = \begin{bmatrix} \boldsymbol{H}_{\alpha} & \boldsymbol{0} & \boldsymbol{0} \end{bmatrix} \begin{bmatrix} \boldsymbol{a}_\alpha \\[2pt] \boldsymbol{a}_{u_x} \\[2pt] \boldsymbol{a}_{u_z} \end{bmatrix},\]

\[\boldsymbol{H}_\alpha = \boldsymbol{W} \widehat{\boldsymbol{U}}_{\alpha} ,\] where \(\boldsymbol{W}\in\mathbb{R}^{p\times m}\) is the interpolation operator that extracts VOF values of cells in the observation window, \(p\) is the number of cells in the window.

When the free surface is represented using the level set method, the observation operator becomes nonlinear and is expressed as

\[\mathcal{H}(\boldsymbol{x}^f) = {\begin{bmatrix} \mathcal{H}_\phi(\cdot) & \boldsymbol{0} & \boldsymbol{0} \end{bmatrix}} \begin{bmatrix} \boldsymbol{a}_\phi\\ \boldsymbol{a}_{u_x}\\ \boldsymbol{a}_{u_z} \end{bmatrix},\]

\[\mathcal{H}_\phi(\boldsymbol{a}_\phi) = \boldsymbol{W}\,\mathcal{J}\!\left(\widehat{\boldsymbol{U}}_{\phi} \,\boldsymbol{a}_\phi \right).\] where \(\mathcal{J}:\mathbb{R}^{m}\to\mathbb{R}^m\) is the operator that converts the level set to VOF.

References↩︎

[1]
R. J. Rapp and W. K. Melville, Laboratory measurements of deep-water breaking waves, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, vol. 331, no. 1622, pp. 735–800, 1990.
[2]
Lord Rayleigh, On waves, Phil. Mag., vol. 1, pp. 257–259, 1876.
[3]
M. Onorato, A. R. Osborne, M. Serio, L. Cavaleri, C. Brandini, and C. T. Stansberg, Extreme waves, modulational instability and second order theory: wave flume experiments on irregular waves, European Journal of Mechanics - B/Fluids, vol. 25, no. 5, pp. 586–601, 2006.
[4]
Guizien N. Katell and Barthélemy Eric, Accuracy of solitary wave generation by a piston wave maker, Journal of Hydraulic Research, vol. 40, no. 3, pp. 321–331, 2002.
[5]
Derek Garard Goring, Tsunamis, the propagation of long waves onto a shelf, pp. 176–205, 1978.
[6]
Yoshimi Goda and Yasumasa Suzuki, Estimation of Incident and Reflected Waves in Random Wave Experiments, in Coastal Engineering 1976, pp. 828–845, 1976.
[7]
Jiahui Wang, Rundong Liu, Hong Xiao, Point-cloud neural network framework for high-resolution velocity field reconstruction in wave-structure interaction, Ocean Engineering, vol. 340, 122223, 2025.
[8]
Omar Jebari, Seong Hyeon Hong, Travis Hunsucker, Do Kyun Kim, Chungkuk Jin, Wave-field reconstruction using stereo cameras on a floating platform in a synthetic environment, Ocean Engineering, vol. 327, 120958, 2025.
[9]
Xiaowei Jin, Shengze Cai, Hui Li, and George Em Karniadakis, NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations, Journal of Computational Physics, vol. 426, 109951, 2021.
[10]
Ionel M Navon, Data assimilation for atmospheric, oceanic and hydrologic applications , Data assimilation for atmospheric, oceanic and hydrologic applications, pp. 21-65, 2009.
[11]
Keyun Zhu, I. Michael Navon, Xiaolei Zou, Variational Data Assimilation with a Variable Resolution Finite-Element shallow-water Equations Model, Monthly Weather Review, vol. 122, no. 5, pp. 946–965, 1994.
[12]
Laura M. Stewart, Sarah L. Dance, Nancy K. Nichols, Data assimilation with correlated observation errors: experiments with a 1-D shallow water model, Tellus A: Dynamic Meteorology and Oceanography, vol. 65, no. 1, 19546, 2013.
[13]
Guangyao Wang and Yulin Pan, Phase-resolved ocean wave forecast with ensemble-based data assimilation, Journal of Fluid Mechanics, vol. 918, A19, 2021.
[14]
AJC Barré de Saint-Venant, Théorie et équations générales du mouvement non permanent des eaux courantes, Comptes Rendus des séances de l’Académie des Sciences, Paris, France, vol. 17, no. 73, pp. 147–154, 1871.
[15]
Joseph Boussinesq, Théorie des ondes et des remous qui se propagent le long d’un canal rectangulaire horizontal, en communiquant au liquide contenu dans ce canal des vitesses sensiblement pareilles de la surface au fond, Journal de mathématiques pures et appliquées, vol. 17, pp. 55–108, 1872.
[16]
Douglas G. Dommermuth and Dick K. P. Yue, A high-order spectral method for the study of nonlinear gravity waves, Journal of Fluid Mechanics, vol. 184, pp. 267–288, 1987.
[17]
Geir Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, Journal of Geophysical Research: Oceans, vol. 99, no. C5, pp. 10143–10162, 1994.
[18]
François-Xavier Le Dimet and Olivier Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects , Tellus A: Dynamic Meteorology and Oceanography, vol. 38, no. 2, pp. 97-110, 1986.
[19]
A. Gronskis, D. Heitz, E. Mémin, Inflow and initial conditions for direct numerical simulation based on adjoint data assimilation, Journal of Computational Physics, vol. 242, pp. 480–497, 2013.
[20]
Dimitry P. G. Foures, Nicolas Dovetta, Denis Sipp, Peter J. Schmid, A data-assimilation method for Reynolds-averaged Navier–Stokes-driven mean flow reconstruction, Journal of Fluid Mechanics, vol. 759, pp. 404–431, 2014,.
[21]
Hiroshi Kato, Shigeru Obayashi, Approach for uncertainty of turbulence modeling based on data assimilation technique, Computers & Fluids, vol. 85, pp. 2–7, 2013.
[22]
C. H. Colburn, J. B. Cessna, T. R. Bewley, State estimation in wall-bounded flow systems. Part 3. The ensemble Kalman filter, Journal of Fluid Mechanics, vol. 682, pp. 289–303, 2011,.
[23]
I.Yu. Gejadze and G. J.M. Copeland and I.M. Navon, Open Boundary Control Problem for Navier-Stokes Equations Including a Free Surface: Data Assimilation, Computers & Mathematics with Applications, vol. 52, pp. 1269-1288, 2006.
[24]
F. Fang, C.C. Pain, M.D. Piggott, G.J. Gorman, and A.J.H. Goddard, An adaptive mesh adjoint data assimilation method applied to free surface flows , International Journal for Numerical Methods in Fluids, vol. 47, pp. 995-1001, 2005.
[25]
Yilang Liu, Weiwei Zhang, and Zhenhua Xia, A new data assimilation method of recovering turbulent mean flow field at high Reynolds numbers , Aerospace Science and Technology, vol. 126, 107328, 2022.
[26]
Ryota Kikuchi, Takashi Misaka, and Shigeru Obayashi, Assessment of probability density function based on POD reduced-order model for ensemble-based data assimilation , Fluid Dynamics Research, vol. 47, no. 5, 051403, 2015.
[27]
C.W Hirt and B.D Nichols, Volume of fluid (VOF) method for the dynamics of free boundaries , Journal of Computational Physics, vol. 39, no. 1, pp. 201-225, 1981.
[28]
Stanley Osher and James A Sethian, Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations , Journal of Computational Physics, vol. 79, no. 1, pp. 14-49, 1988.
[29]
T. Ménard, S. Tanguy, and A. Berlemont, Coupling level set/VOF/ghost fluid methods: Validation and application to 3D simulation of the primary break-up of a liquid jet , International Journal of Multiphase Flow, vol. 33, no. 5, pp. 510-524, 2007.
[30]
H. G. Weller, G. Tabor, H. Jasak, C. Fureby, A tensorial approach to computational continuum mechanics using object-oriented techniques, Computer in Physics, vol. 12, pp. 620-631, 1998.
[31]
Andre F. C. da Silva and Tim Colonius, Ensemble-Based State Estimator for Aerodynamic Flows, AIAA Journal, vol. 56, pp. 2568-2578, 2018.
[32]
Alexander Hay, Jeffrey T. Borggaard, and Dominique Pelletier, Local improvements to reduced-order models using sensitivity analysis of the proper orthogonal decomposition, Journal of Fluid Mechanics, vol. 629, pp. 41–72, 2009.