Mean flow data assimilation using physics-constrained Graph Neural Networks

Michele Quattromini
Dipartimento di Meccanica, Matematica e Management
Politecnico di Bari
Bari, 70126 Italy
LISN-CNRS
Université Paris-Saclay
Orsay, 91440 France
michele.quattromini@poliba.it
Michele Alessandro Bucci
Digital Sciences & Technologies Department
SafranTech
Magny-Les-Hameaux, 78114 France
Stefania Cherubini
Dipartimento di Meccanica, Matematica e Management
Politecnico di Bari
Bari, 70126 Italy
Onofrio Semeraro
LISN-CNRS
Université Paris-Saclay
Orsay, 91440 France


Abstract

Despite their widespread use, purely data-driven methods often suffer from overfitting, lack of physical consistency, and high data dependency, particularly when physical constraints are not incorporated. This study introduces a novel data assimilation approach that integrates Graph Neural Networks (GNNs) with optimisation techniques to enhance the accuracy of mean flow reconstruction, using Reynolds-Averaged Navier–Stokes (RANS) equations as a baseline. The method leverages the adjoint approach, incorporating RANS-derived gradients as optimisation terms during GNN training, ensuring that the learned model adheres to physical laws and maintains consistency. Additionally, the GNN framework is well-suited for handling unstructured data, which is common in the complex geometries encountered in Computational Fluid Dynamics (CFD). The GNN is interfaced with the Finite Element Method (FEM) for numerical simulations, enabling accurate modelling in unstructured domains. We consider the reconstruction of mean flow past bluff bodies at low Reynolds numbers as a test case, addressing tasks such as sparse data recovery, denoising, and inpainting of missing flow data. The key strengths of the approach lie in its integration of physical constraints into the GNN training process, leading to accurate predictions with limited data, making it particularly valuable when data are scarce or corrupted. Results demonstrate significant improvements in the accuracy of mean flow reconstructions, even with limited training data, compared to analogous purely data-driven models.

1 Introduction↩︎

In recent years, the integration of Machine Learning (ML) algorithms into Computational Fluid Dynamics (CFD) has experienced significant growth, driven by the increasing efficiency of ML models in processing large datasets and their impressive capabilities in inference and prediction. The literature is already rich with various effective approaches for combining ML algorithms with CFD, as highlighted in the review works by Brunton et al. [1] and Vinuesa et al. [2]. These applications have found a natural domain in turbulence modelling problems, ranging from the closure of Reynolds-Averaged Navier-Stokes (RANS) equations to wall-modelled Large Eddy Simulations [3][5]. Among the many research efforts addressing these challenges, Ling & Templeton [6] applied classification methods to identify regions of uncertainty where the closure term in RANS might fail, while other approaches leverage baseline models such as the Spalart-Allmaras closure [7], physics-informed neural networks (PINNs) [8], regression methods [9], decision trees [10], ensemble methods [11], Tensor-Basis Neural Networks [12], [13], and genetic programming [14], [15]. For a broader overview, we refer to references [10], [16], and [17], where thorough reviews of ML techniques applied to the subject are provided.

On the other hand, alternative approaches are based on data assimilation. In this framework, diverse observations are integrated with the governing dynamical laws of a system to optimally estimate physical variables. Starting from a background state and incorporating imperfect observational data, it yields the best possible estimate of the system’s true state by accounting for the statistical reliability of each data source. These methodologies are routinely employed in meteorology, where accurate weather forecasting is achieved by combining satellite imagery, meteorological measurements, and numerical models [18], [19]. Two main approaches to data assimilation can generally be identified: stochastic estimation, which relies on probabilistic methods such as Kalman filtering, and variational data assimilation. In the context of fluid mechanics, recent studies employing stochastic estimation include [20], [21]. Variational assimilation, and in particular the four-dimensional variational assimilation (4D-Var) methods have been applied in [22] for turbulence model identification, and in a similar spirit in [23]. The success of these applications in fluid mechanics is not surprising as variational methods based on adjoint formulations are commonly applied across a broad range of problems. These include sensitivity analysis in stability studies [24][26], optimisation [27][30], receptivity analysis [31], and flow control [32], [33]. A thorough overview of these applications is provided by Luchini and Bottaro [34]. Returning to the topic of mean flow reconstruction, Foures et al. [23] proposed a variational data assimilation method for reconstructing the mean flow field from partial measurements using the forced RANS equations. Their approach minimizes the discrepancy between experimental data and numerical solutions by identifying an optimal forcing term that effectively represents the unknown Reynolds stresses. This is achieved through a direct-adjoint looping procedure and the closure model is interpreted as an optimal control variable informed by the adjoint field.

Building on this, recent efforts combining data assimilation with neural network (NN) modelling have been proposed in references [35][37]. Following this philosophy, in this work, we propose a novel approach that integrates Graph Neural Networks (GNNs) with Reynolds-Averaged Navier-Stokes (RANS) equations to enhance the training process. The resulting method is referred to as the Physics-Constrained Graph Neural Network (PhyCo-GNN). Specifically, the forcing term in the RANS equations is modelled using a GNN, where the network is informed by gradients obtained via auto-differentiation and analytical gradients derived from the adjoint equations during the iterative process. By embedding the RANS closure term into the optimisation framework via the adjoint method, we leverage adjoint methods to efficiently compute gradients necessary for optimisation. This ensures that the gradients used during the GNN training are informed by a deterministic physical model, rather than relying solely on available data, thus ensuring physical consistency. A key feature of this approach is the use of GNNs [38], which are characterised by complex, multi-connected nodes that can be naturally adapted to unstructured meshes. In a GNN, convolution is performed by aggregating information from neighbouring nodes, overcoming the limitations of geometry typically encountered in Convolutional Neural Networks (CNNs). Additionally, GNNs exhibit superior generalisation capabilities compared to standard models [39], are differentiable, and enable direct learning of operators through discrete stencils [40]. Due to these advantages, GNNs have recently gained attention in fluid mechanics. A comprehensive review by Lino et al. [41] and examples from [42] and [43], where GNNs are used to model wall shear stress in LES simulations, highlight their potential. In Quattromini et al. [44], a GNN is used to predict the closure term in RANS equations at low Reynolds numbers, using a supervised learning strategy. In that setting, the NN model receives the mean flow field as input and is trained to regress the corresponding forcing term, with both quantities derived from high-fidelity DNS data. The approach is designed to map the mean flow to the forcing term in a purely data-driven setting. In contrast, the present work incorporates this GNN architecture as a pre-trained module within a broader physics-constrained optimisation framework. The GNN model is here repurposed to act as a prior for a data assimilation scheme governed by the RANS equations. The key novelty lies in the integration of the GNN into an adjoint-based inverse problem, where the NN model is trained not to only replicate the DNS-derived forcing term, but also to minimise discrepancies in the mean flow field reconstruction through gradient-based optimisation, informed by the governing physics. The remainder of this article is structured as follows. The mathematical framework is described in Sec. 2, where the physical baseline model for the numerical simulations is detailed in Sec. 2.1, and the adjoint optimisation method is presented in Sec. 2.2. Sec. 4 outlines the ML framework, detailing the GNN architecture (Sec. 4.1), the dataset preprocessing (Sec. 4.2), and the training algorithm (Sec. 4.3). We then proceed to present the PhyCo-GNN approach in Sec. 5. Results are presented in Sec. 6, where mean flow reconstruction is shown for several test cases, including sparse data recovery, denoising, and inpainting of missing flow data. Conclusions are drawn in Sec. 7.

2 Mean Flow Reconstruction Using Data Assimilation↩︎

2.1 Governing Equations↩︎

This study focuses on two-dimensional (2D) incompressible fluid flows around bluff bodies in unsteady regimes. We begin with the Navier-Stokes (NS) equations for incompressible flows. Let \(\mathbf{x}=(x,y)\) denote the spatial Cartesian coordinates. The velocity field \({\mathbf{u}}(\mathbf{x}, t)\) and pressure field \(p(\mathbf{x}, t)\) follow these dynamics: \[\begin{align} \dfrac{\partial \mathbf{u}}{\partial t} +\left(\mathbf{u}\cdot\nabla\right) \mathbf{u}&= -\nabla p+\dfrac{1}{Re}\nabla^2\mathbf{u}\\ \nabla\cdot\mathbf{u}&= 0. \end{align} \label{eq:unsteadyNS}\tag{1}\] The above equations are made dimensionless by introducing the characteristic length scale \(D\) (e.g., the diameter of the circumscribed bluff body circumference), the velocity \(U_{\infty}\) of the incoming flow, and \(\rho U_{\infty}^2\) as the reference pressure. The Reynolds number is defined as \(Re = \frac{U_{\infty} D}{\nu}\), where \(\nu\) represents the kinematic viscosity. This number represents the ratio between the inertial forces and the viscous forces in the fluid flow and is used to characterise the flow regime, indicating whether the flow is laminar or turbulent, depending on the specific case. In this study, we focus on unsteady nonlinear flows that settle on limit cycles or quasi-periodic regimes, a configuration typically occurring when supercritical flows have not yet become turbulent.

The NS equations can be computationally intensive. Hence, various approximate models are used based on the required accuracy. In this study, we adopt the Reynolds-averaged Navier-Stokes (RANS) model, a time-averaged formulation of the NS equations. By introducing the Reynolds decomposition: \[\mathbf{u}(\mathbf{x},t) = \overline{\mathbf{u}}(\mathbf{x}) +\mathbf{u}'(\mathbf{x},t), \label{eq:reydeco}\tag{2}\] the unsteady velocity field \(\mathbf{u}=(u,v)^T\) is split into an ensemble-averaged velocity field \(\overline{\mathbf{u}}=(\overline{u},\overline{v})^T\) and a fluctuating component \(\mathbf{u}'=({u'},{v'})^T\). Formally, this decomposition allows any unsteady flow to be expressed as a sum of a steady mean flow and an unsteady fluctuating part. Plugging the Reynolds decomposition (Eq. 2 ) into the NS equations (Eq. 1 ) and ensemble-averaging results in: \[\begin{align} \left(\overline{\mathbf{u}}\cdot\nabla\right) \overline{\mathbf{u}}&= -\nabla\overline{p} +\dfrac{1}{Re}\nabla^2\overline{\mathbf{u}} +\mathbf{f}\\ \nabla\cdot\overline{\mathbf{u}}&= 0, \end{align} \label{Eq:RANS}\tag{3}\] where \(\overline{p}\) is the mean pressure field. These equations are completed with a set of boundary conditions, detailed in Sec. 3, together with a description of the numerical setup.

In this context, the Reynolds stress tensor \(\mathbf{f}\) acts as a closure term for the underdetermined system of nonlinear equations in Eq. 3 . Ideally, \(\mathbf{f}\) can be directly computed, when data are available, as: \[\mathbf{f}= -\nabla \cdot (\overline{\mathbf{u}'\mathbf{u}'}).\label{eq:forcing}\tag{4}\] In practice, mathematically computing \(\mathbf{f}\) requires either Direct Numerical Simulation (DNS) or sufficiently long sampling of experimental or numerical data to accurately capture the statistical properties of the fluctuating components \(\mathbf{u}'\). Unlike the mean flow \(\overline{\mathbf{u}}\), the fluctuating components are inherently statistical in nature, and their proper representation depends on achieving statistical convergence rather than full time-resolved data. This is known as the closure problem. Several approximations, like the Boussinesq hypothesis — e.g. \(k\)-\(\epsilon\), \(k\)-\(\omega\), Spalart-Allmaras models [45] — or more complex models such as the explicit algebraic Reynolds stress model [46] and differential Reynolds stress models [47], can be introduced to address this problem. Nevertheless, in this work, we do not assume any modelling approximation of the forcing term \(\mathbf{f}\) but instead aim to compute it by combining neural-network modelling and data assimilation; a GNN model is trained to infer the Reynolds stress term \(\mathbf{f}\) from a given mean flow \(\overline{\mathbf{u}}\) introduced as input. The Reynolds stress term \(\mathbf{f}\) is the output of the GNN and is assumed in this context as a control variable in the adjoint optimisation process (see Sec.5).

Figure 1: (a) Streamwise component of the mean flow, \overline{\mathbf{u}}, and vorticity isolines, \omega = \nabla \times \mathbf{u}, for the flow past a cylinder at Re = 150. (b) For the same case, the streamwise component of the closure term, \mathbf{f}, is shown. In both cases, only a portion of the domain is displayed

2.2 Adjoint-based Data Assimilation↩︎

Among the various strategies available in the literature, data assimilation schemes can be formulated by casting optimisation loops based on the adjoint method. In this approach, a cost function is defined to be either maximised or minimised, alongside a control variable to be adjusted accordingly. To this end, we introduce a mapping that projects the mean flow field \(\overline{\mathbf{u}}\) onto the measurement vector \(\bar{\mathbf{m}}\) as follows \[\bar{\mathbf{m}}=\mathcal{M}{(\overline{\mathbf{u}})}.\] The operator \(\mathcal{M}\) defines the ground truth data used in the optimisation process. If \(\mathcal{M} = I\), we consider the full-field \(\overline{\mathbf{u}}\); otherwise, \(\mathcal{M}\) may be designed to project the field onto a subset of measurements (e.g., sparse observations or an incomplete flow field, as in the inpainting scenario; see Sec. 6). Based on this definition, the cost function to be minimised is given by the discrepancy between the ground truth \(\bar{\mathbf{m}}\) and the reconstructed measurements \(\mathcal{M}(\hat{\overline{\mathbf{u}}})\) \[\mathcal{J}\left(\hat{\overline{\mathbf{u}}}\right) = \frac{1}{2} ||\bar{\mathbf{m}} - \mathcal{M}(\hat{\overline{\mathbf{u}}})||_2^2, \label{eq:cost95function}\tag{5}\] where \(||\cdot||_2\) represents the \(L^2\)-norm, defined in association with the scalar product over the domain \(\Omega\): \[\langle\mathbf{a}, \mathbf{b}\rangle = \int_\Omega \mathbf{a}\cdot\mathbf{b} \, d\Omega. \label{eq:scalar95product}\tag{6}\]

We consider the RANS equations (Eq. 3 ) as the baseline equation and choose the unknown forcing stress term \(\mathbf{f}\) as the control variable. This procedure is inspired by the study [23], where an optimisation loop is designed to reconstruct the mean flow \(\overline{\mathbf{u}}\).
Since the control variable is the forcing stress term \(\mathbf{f}\), the cost function \(\mathcal{J}\) (Eq. 5 ) does not explicitly depend on it. Therefore, to relate the cost function \(\mathcal{J}\) to the forcing stress tensor \(\mathbf{f}\), we need to define an augmented Lagrangian functional \(\mathcal{L}\): \[\mathcal{L}\left(\mathbf{f}, \hat{\overline{\mathbf{u}}}, \hat{\overline{p}}, \hat{\overline{\mathbf{u}}}^\dagger, \hat{\overline{p}}^\dagger\right) = \mathcal{J} \left(\hat{\overline{\mathbf{u}}}\right) - \langle\hat{\overline{\mathbf{u}}}^\dagger, \hat{\overline{\mathbf{u}}}\cdot\nabla\hat{\overline{\mathbf{u}}}+ \nabla\hat{\overline{p}}- \dfrac{1}{Re}\nabla^2\hat{\overline{\mathbf{u}}}- \mathbf{f}\rangle - \langle\hat{\overline{p}}^\dagger, \nabla\cdot\hat{\overline{\mathbf{u}}}\rangle \label{eq:lagrangian95augmented}\tag{7}\] where \(\langle \cdot, \cdot \rangle\) represents the spatial scalar product, as defined in Eq. 6 . The augmented Lagrangian functional \(\mathcal{L}\) allows us to represent the constrained optimisation problem as an unconstrained one by introducing two a-priori unknown variables, the Lagrangian multipliers \(\hat{\overline{\mathbf{u}}}^\dagger\) and \(\hat{\overline{p}}^\dagger\). To minimise the functional defined in Eq. 7 , its partial derivatives with respect to all independent variables of the problem must be set to zero.

Following this approach, nullifying the partial derivatives of Eq. 7 with respect to the direct variables \(\hat{\overline{\mathbf{u}}}\) and \(\hat{\overline{p}}\) yields the adjoint Navier-Stokes equations: \[\begin{align} -\hat{\overline{\mathbf{u}}}\cdot\nabla\hat{\overline{\mathbf{u}}}^\dagger + \hat{\overline{\mathbf{u}}}^\dagger \cdot \nabla\hat{\overline{\mathbf{u}}}^T - \nabla\hat{\overline{p}}^\dagger - \dfrac{1}{Re}\nabla^2\hat{\overline{\mathbf{u}}}^\dagger &= \dfrac{\partial\mathcal{J}}{\partial\hat{\overline{\mathbf{u}}}}\\ \nabla\cdot\hat{\overline{\mathbf{u}}}^\dagger &= 0 \end{align} \label{eq:adjoint95NS}\tag{8}\] along with an appropriate set of boundary conditions, detailed in Sec. 3. It is worth noting that the adjoint NS equations (Eq. 8 ) are forced on the right-hand side by the partial derivative of the error function \(\mathcal{J}\) with respect to the reconstructed mean flow \(\hat{\overline{\mathbf{u}}}\). This latter can be easily derived from Eq. 5 as \[\dfrac{\partial\mathcal{J}}{\partial\hat{\overline{\mathbf{u}}}} = -\dfrac{\partial \mathcal{M}}{\partial \hat{\overline{\mathbf{u}}}} \left(\bar{\mathbf{m}} -\mathcal{M}(\hat{\overline{\mathbf{u}}})\right). \label{eq:depsilon95du}\tag{9}\] Finally, the partial derivative of Eq. 7 with respect to the forcing term \(\mathbf{f}\) yields: \[\dfrac{\partial\mathcal{J}}{\partial\mathbf{f}} = \hat{\overline{\mathbf{u}}}^\dagger. \label{eq:depsilo95df}\tag{10}\] More details can be found in [23], where the case of localised measurements is also included in the analytical description of the direct-adjoint loop.

By exploiting the gradients of the cost function \(\mathcal{J}\) with respect to the control variable \(\mathbf{f}\) (Eq. 10 ), a gradient descent algorithm can be applied to optimise \(\mathbf{f}\) and iteratively converge towards the optimal solution that minimises the cost function \(\mathcal{J}\) (Eq. 5 ). Specifically, the iterative direct-adjoint process refines the forcing term \(\mathbf{f}\), ensuring that the RANS model accurately captures the mean flow characteristics observed in high-fidelity DNS data. To summarise the process algorithmically, the adjoint optimisation procedure involves the following steps:

  1. Initialisation – an initial guess for the control variable \(\mathbf{f}\) is chosen. We start from \(\mathbf{f}= 0\) to ensure the divergence-free property (\(\nabla \cdot \mathbf{f}= \mathbf{0}\)) and consistency with the adopted boundary conditions [23].

  2. Forward step – a prediction of the mean flow \(\hat{\overline{\mathbf{u}}}\) is obtained given the forcing term \(\mathbf{f}\), by solving Eq. 3 .

  3. Cost function evaluation – the distance between the reconstructed mean flow \(\hat{\overline{\mathbf{u}}}\) and the ground truth mean flow \(\overline{\mathbf{u}}\) is assessed using the cost function \(\mathcal{J}\) (Eq. 5 ).

  4. Adjoint step – the adjoint equations (Eq. 8 ), forced by the term in Eq. 9 , are solved to find \(\hat{\overline{\mathbf{u}}}^\dagger\). The term in Eq. 10 expresses the variation of the cost function \(\mathcal{J}\) with respect to the control variable \(\mathbf{f}\).

  5. Control variable update – using this gradient, corresponding to the adjoint variable \(\hat{\overline{\mathbf{u}}}^\dagger\), the forcing term \(\mathbf{f}\) is adjusted as: \[\mathbf{f}^{(n+1)} = \mathbf{f}^{(n)} +\beta\dfrac{\partial\mathcal{J}^{(n)}}{\partial \mathbf{f}^{(n)}} = \mathbf{f}^{(n)} +\beta\hat{\overline{\mathbf{u}}}^{\dagger (n)},\] where the superscript \(^{(n)}\) indicates the \(n\)-th iteration of the optimisation loop and \(\beta\) a coefficient related to the step along the gradient direction. Note that the variable \(\hat{\overline{\mathbf{u}}}^\dagger\) is physically divergence-free, thus justifying the initialisation of \(\mathbf{f}\) as a null field.

The direct-adjoint process is iteratively repeated until the cost function \(\mathcal{J}\) (Eq. 5 ) drops below a predefined threshold, based on the required accuracy.

3 Numerical Setup↩︎

Figure 2: Sketch of the computational domain geometry, where the diameter of the circumscribed circle around the bluff body, along with the height and length of the domain, are provided in non-dimensional units

The unsteady wake behind a bluff body is a well-established benchmark in CFD. As a reference case, the cylinder bluff body exhibits stable behaviour up to a critical Reynolds number, \(Re_{c} \cong 46.7\) [24], [48]. Beyond this threshold, irregular velocity fluctuations appear, accompanied by periodic vortex formation [49], and the unsteady flow evolves into a limit cycle known as the von Karman street. This phenomenon is observable up to \(Re = 150\) for 2D cases [49]. At these \(Re\), dynamics can also settle in quasi-periodic regimes or show chaotic behaviour, see i.e. the case of fluidic pinball [50]. This study focuses on 2D scenarios exhibiting periodic or quasi-periodic behaviour within the range \(50 \le Re \le 150\). In our numerical setup, the characteristic dimension is the diameter \(D\) of the circumscribed circle around the bluff body. Based on this dimension, the computational domain extends \(L_x = 27\) units in the streamwise direction and \(L_y = 10\) units in the transverse direction. The system’s origin \(O(0, 0)\) is positioned \(\Delta x = 9\) units downstream from the inlet and \(\Delta y = 5\) units from the symmetry boundaries. A pictorial sketch of the geometric configuration of the computational domain is shown in Fig. 2.

The flow evolves from left to right with a dimensionless uniform velocity \(\mathbf{u}= (1, 0)^T\), normalised by the reference velocity \(U_{\infty}\) of the undisturbed flow. Boundary conditions follow the setup described by [23]. For the direct NS equations (Eq. 1 ), they are as follows: \[\begin{align} \begin{cases} \begin{array}{r l} u = 1,v = 0 &\text{at the inlet,} \\ u = 0,v = 0 &\text{on the cylinder surface,}\\ \partial_y u = 0,v = 0 &\text{on symmetry boundaries,}\\ \dfrac{1}{Re}\partial_x u - p = 0,\partial_x v = 0 &\text{at the outlet}.\\ \end{array} \end{cases} \end{align}\] For the adjoint NS equations (Eq. 8 ), the boundary conditions are: \[\begin{align} \begin{cases} \begin{array}{r l} u^\dagger = 1,v^\dagger = 0 &\text{at the inlet,} \\ u^\dagger = 0,v^\dagger = 0 &\text{on the cylinder surface,}\\ \partial_y u^\dagger = 0,v^\dagger = 0 &\text{on symmetry boundaries,}\\ \dfrac{1}{Re}\partial_x u^\dagger + p^\dagger = -uu^\dagger,\dfrac{1}{Re}\partial_x v^\dagger = -uv^\dagger &\text{at the outlet}.\\ \end{array} \end{cases} \end{align}\] Simulations start with null flow fields at \(t=0\). Required statistics, such as the mean flow \(\overline{\mathbf{u}}\) and forcing stress term \(\mathbf{f}\), are computed on-the-fly. The final simulation time \(T\) is determined by a convergence criterion, specifically when the L2-norm difference between consecutive mean flows falls below \(10^{-8}\).

Spatial discretisation is achieved using a Finite Element Method (FEM) approach via the FEniCS Python library [51], with time integration handled by a second-order Backward Differentiation Formula (BDF). Meshes are refined near the obstacle and in the wake region to accurately capture flow dynamics. Depending on the specific case, they typically consist of around \(13,500\) nodes on average. Fig. 1\(a\) depicts the streamwise component of the mean flow \(\overline{\mathbf{u}}\) along with the vorticity isolines \(\omega = \nabla \times \mathbf{u}\), while Fig. 1\(b\) shows the streamwise component of the closure term \(\mathbf{f}\) for the cylinder bluff body reference case at \(Re = 150\).

4 Graph Neural Network overview↩︎

In this section, we present an overview of the GNN architecture used in this study. While a comprehensive description of the GNN architecture can be found in Hamilton et al. [38], our focus here is to describe the specific implementation used in this work, tailored to the reconstruction of mean flow fields in CFD scenarios. In particular, we describe how unstructured mesh data are encoded as graphs, how physical quantities such as mean velocity fields and geometrical distances are integrated into the model as node and edge features respectively, and how the GNN outputs the predicted forcing term required to solve the RANS equations. The implementation relies on the PyTorch Geometric library [52], which provides an efficient and scalable infrastructure for the Message Passing (MP) algorithms at the core of the GNN model.
Sec. 4.1 delves into the MP process adopted in our GNN, Sec. 4.2 outlines the specific structure of the input data used to encode CFD problems as graphs; the GNN training and inference algorithm is presented in Sec. 4.3. For readers already familiar with Quattromini et al. [44], we note that the GNN architecture described here is unchanged. However, it is employed in this work with a fundamentally different objective; those readers may wish to skip directly to Section 5, where the novel data assimilation strategy is introduced.

4.1 Message Passing Process↩︎

In GNNs, nodes iteratively exchange information with their neighbours to update their latent representations. This process, known as MP, allows GNNs to capture complex dependencies and patterns within data by considering the edges as important information relevant to the problem. Depending on the graph’s size, the MP process can be repeated an arbitrary number of times, defined by a hyperparameter \(k\). A detailed list of the hyperparameters used is provided in App. 8. The MP process is node-centered and consists of three fundamental steps:

  1. Message Creation: Each node \(i\) begins with an embedding state represented by a vector \(\mathbf{h}_i\). Initially set to zero, this vector accumulates and processes information during the MP iterations. The dimension \(d_h\) of \(\mathbf{h}_i\) is constant across all nodes and is a key model hyperparameter. This dimension determines the expressivity of the GNN, which reflects its ability to model complex functions [53], [54]. Note that the embedded state itself does not have a direct physical interpretation, while the distance matrix indeed captures some physical relation between the nodes.

  2. Message Propagation: Information is propagated between nodes. To capture the convective and diffusive dynamics of the underlying CFD system, messages are exchanged bidirectionally between connected nodes. For a pair of connected nodes \(i\) and \(j\), with a directed connection \(\mathbf{a}_{ij}\) from \(i\) to \(j\), the message generated is defined as: \[\boldsymbol{\phi}_{i,j}^{(k)} = \zeta^{(k)}(\mathbf{h}_i^{(k-1)}, \mathbf{a}_{ij}, \mathbf{h}_j^{(k-1)}), \label{eq:phi}\tag{11}\] where \(\mathbf{h}_i^{(k-1)}\) is the embedded state from the previous MP layer \((k-1)\), and \(\zeta^{(k)}\) is a differentiable operator, such as a Multi-Layer Perceptron (MLP) [55]. By swapping the indices \(i\) and \(j\) in Eq. 11 , we obtain the message flowing from \(j\) to \(i\). Depending on the number of \(j\) connected nodes in the neighbouring set of \(i\), namely \(\mathcal{N}_i\), for each node \(i\) the global outgoing message is then computed as: \[\boldsymbol{\phi}_{i,\rightarrow} = \bigoplus_{j\in\mathcal{N}_i} \boldsymbol{\phi}_{i,j}\] where \(\bigoplus\) represents a differentiable, permutation-invariant function (e.g., sum, mean, or max). In our study, we choose the mean function to preserve permutation invariance, which is crucial when working with unstructured meshes where the neighbourhood size of each node may vary. This ensures that the model can generalise across different mesh configurations.

  3. Message Aggregation: Each node \(i\) aggregates the received information to update its embedded state \(\mathbf{h}_i^{(k)}\): \[\mathbf{h}_i^{(k)} = \mathbf{h}_i^{(k-1)} + \alpha \Psi^{(k)}(\mathbf{h}_i^{(k-1)}, \mathbf{G}_i, \boldsymbol{\phi}_{i,\rightarrow}^{(k)}, \boldsymbol{\phi}_{i,\leftarrow}^{(k)}, \boldsymbol{\phi}_{i,\circlearrowright}^{(k)}),\] where \(\mathbf{G}_i = \{\overline{\mathbf{u}}, Re\}\) represents external quantities injected into the GNN (mean flow \(\overline{\mathbf{u}}\) and Reynolds number \(Re\) in our case), provided at each update \(k\). The terms \(\boldsymbol{\phi}_{i,\rightarrow}^{(k)}\) and \(\boldsymbol{\phi}_{i,\leftarrow}^{(k)}\) represent respectively the message sent to and received from all the neighbouring nodes. The term \(\boldsymbol{\phi}_{i,\circlearrowright}^{(k)}\) is the self-message that the node \(i\) send to itself in order to maintain the node’s own information while aggregating messages from its neighbours. Their mathematical definition, with the appropriate change in notation, is expressed in Eq. 11 . The term \(\Psi^{(k)}\) is a differentiable operator, typically an MLP, used to handle together the gathered information. The term \(\alpha\) is a hyperparameter relaxation coefficient controlling the update scale (App. 8).

At the end of the MP process, each node’s embedded state has been updated \(k\) times, incorporating information from its neighbours. The choice of \(k\) is crucial for the model’s generalisation across graphs of varying sizes. As shown by [56], \(k\) should be proportional to the graph’s diameter to ensure effective information propagation [57]. However, since the graphs in the dataset used in this study exhibit a relatively consistent diameter, we opted to optimise this hyperparameter using genetic algorithms (App. 8). Additionally, [56] explored recurrent or adaptive architectures, where \(k\) could dynamically adjust based on the graph’s structure. Once the MP process concludes, the latest \(k\)-updated embedded state of each node is projected back into the physical space to predict the desired target, in this case, the forcing stress term \(\mathbf{f}\). A differentiable operator such as an MLP (decoder \(D\)) handles this operation.

Figure 3: The overall framework of our GNN training process. {MP}^k denotes the message passing algorithms, {D}^{(k)} are the k decoders, which are trainable MLPs, \boldsymbol{A}^k are the k matrices containing the embedded states of each node, and \boldsymbol{G} is the vector containing the input injected into the GNN. The figure is inspired by [57]

4.2 Data Structuring↩︎

Applying GNNs to unstructured CFD data requires representing the CFD mesh as a graph. In order to obtain the CFD-GNN interface, we align each mesh node with a GNN node. To this end, we structure the data into tensors that maintain the mesh’s adjacency properties. Specifically, for each case in the ground truth dataset, we construct:

  • A matrix \(\mathbf{A} \in \mathbb{R}^{n_i \times d_h}\), \[\boldsymbol{A}_{n_i,d_h} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,d_h} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,d_h} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n_i,1} & a_{n_i,2} & \cdots & a_{n_i,d_h} \end{bmatrix},\] where \(n_i\) is the number of nodes in the mesh and \(d_h\) is the dimension of the embedded state. This matrix \(\mathbf{A}\) stacks together all the embedded arrays \(h_i\) defined on all the nodes.

  • A matrix \(\mathbf{C} \in \mathbb{R}^{c \times 2}\), where \(c\) is the number of edges in the mesh, defining the node connections. This matrix represents the adjacency scheme of the mesh.

  • A matrix \(\mathbf{E} \in \mathbb{R}^{c \times 2}\), containing the distances between connected nodes in the \(x\) and \(y\) directions. This matrix captures the geometric properties of the mesh’s adjacency scheme, expressed as the node distances. \[\boldsymbol{C}_{c,2} = \begin{bmatrix} i_1 & j_1\\ i_2 & j_2\\ \vdots & \vdots \\ i_c & j_c \end{bmatrix}, \qquad \boldsymbol{E}_{c,2} = \begin{bmatrix} x_{i_1} - x_{j_1} & y_{i_1} - y_{j_1}\\ x_{i_2} - x_{j_2} & y_{i_2} - y_{j_2}\\ \vdots & \vdots \\ x_{i_c} - x_{j_c} & y_{i_c} - y_{j_c}\\ \end{bmatrix}.\]

Each column of \(\mathbf{A}\) serves as a feature vector for neurons in the MLPs used in the GNN (\(\zeta\), \(\Psi\), and the decoder \(D\)). The MLPs architecture is defined by the dimension \(d_h\) of the embedded state, while the number of nodes \(n_i\) corresponds to the feature count per neuron. This approach enables the use of the same MLP architectures across different CFD simulations, regardless of geometry or node count, as the number of nodes does not affect the underlying structure of the MLP. This approach makes the GNNs particularly well-suited for interacting with unstructured meshes and learning from various geometries and configurations.

4.3 GNN Training Algorithm↩︎

The GNN training framework is shown in Fig. 3. The process begins with \(\mathbf{A}^{0}\), a matrix of zero-initialized embedded states (see Sec. 4.2). This matrix, along with external inputs \(\mathbf{G}\) (mean flow \(\overline{\mathbf{u}}\) and Reynolds number \(Re\)), is fed into the first message passing layer \(\text{MP}^{1}\). The updated embedded state matrix \(\mathbf{A}^{1}\) is then passed through the decoder \(D^{1}\), an MLP responsible for reconstructing the physical state \(\hat{\mathbf{f}}^{1}\). The predicted forcing term \(\hat{\mathbf{f}}^{1}\) is compared to the DNS ground truth \(\mathbf{f}\) using a loss function: \[\ell^k = \frac{1}{n_i}\sum_{i=1}^{n_i}(\mathbf{f}^k_i - \hat{\mathbf{f}_i})^2, \label{eq:loss95function}\tag{12}\] where \(n_i\) is the number of nodes, and \(k\) is the layer number in the MP process. This procedure is repeated across the \(k\) layers of the GNN. Following [57], the intermediate losses from all layers of the MP process are combined in a global loss function \(L\) to robustify the learning process: \[L = \sum_{k=1}^{\bar{k}} \gamma^{\bar{k}-k} \cdot \ell^k, \label{eq:global95cost95generic95GNN}\tag{13}\] where \(\bar{k}\) is the number of layers, and \(\gamma\) is a hyperparameter controlling the weight of each loss term (see App. 8). As the MP process goes on, each node collects more and more information. This exponential weighting ensures that later updates, which are richer in information, have greater influence on the learning process.

5 Methodology↩︎

Figure 4: End-to-end training loop: \overline{\mathbf{u}} is the GNN’s input mean flow, \hat{\mathbf{f}} is the GNN’s predicted forcing stress term, \boldsymbol{\theta} represents the GNN’s trainable parameters, and \mathcal{J}(\hat{\overline{\mathbf{u}}}) is the cost function to minimise. For simplicity of notation, we consider the case where the ground truth corresponds to the entire flow field.

This section describes the data assimilation scheme developed in this study, which combines the training process of a neural model based on GNN (Sec. 4) with the RANS equations (Sec. 2.1). The approach primarily focuses on using analytically derived gradients through the adjoint method (Sec. 2.2) to enhance the GNN learning process and ensure physical consistency in its predictions. This scheme will be referred to as the Physics-Constrained Graph Neural Network (PhyCo-GNN). The complete end-to-end training process is illustrated in Fig. 4, with technical details provided in Sec. 5.1 regarding the training process, in Sec. 5.2 for the GNN pre-training phase, and in Sec. 5.3 for the approach used to handle the transition from pre-training to the main training phase of the GNN.

5.1 Algorithm and training process↩︎

With reference to Fig. 4, the global training process can be ideally divided into two phases: the forward step and the backward step.

The forward step begins by associating an embedded state vector \(\mathbf{h}_0\), initially set to zero, as a node feature to each node of the GNN. These vectors, one for each mesh node, define the initial latent representation of the graph. Each node of the graph corresponds to a point in the computational mesh, and this mapping is preserved through the data structure described in Sec. 4.2. The Euclidean distances between nodes are used as edge features to define the graph connectivity and encoding the local geometric relationships. The constructed graph is passed through a pre-trained GNN (Sec. 5.2) and the state vector on each node is updated through a \(k\)-step message passing process, as outlined in Sec. 4.1. During each message passing iteration, nodes exchange and aggregate information with their neighbours leveraging the geometric edge features. External physical quantities (the mean flow field \(\overline{\mathbf{u}}\) and the Reynolds number \(Re\)), are injected during the aggregation step at each layer, providing the GNN with the necessary context to learn physically meaningful patterns. After \(k\) message passing iterations, the GNN outputs a predicted forcing term \(\hat{\mathbf{f}}\), which represents an estimate of the Reynolds stress closure term. This predicted forcing term \(\hat{\mathbf{f}}\) is used as the closure term in the RANS equations (Eq. 3 ). These equations are numerically solved in a FEM framework using the Python library FEniCS [51], producing a reconstructed mean flow field \(\hat{\overline{\mathbf{u}}}\). This result is then compared with the mean flow ground truth \(\overline{\mathbf{u}}\), obtained from the DNS, to compute the loss function \(\mathcal{J}\), as expressed in Eq. 5 .
The second phase, the backward step, starts with the requirement to compute the derivative of the loss function \(\mathcal{J}\) with respect to the trainable \(\boldsymbol{\theta}\) parameters of the GNN. The gradient chain rule for this term can be mathematically expressed as: \[\frac{\partial \mathcal{J}}{\partial\boldsymbol{\theta}} = \frac{\partial \mathcal{J}}{\partial\hat{\overline{\mathbf{u}}}} \cdot \frac{\partial\hat{\overline{\mathbf{u}}}}{\partial\hat{\mathbf{f}}} \cdot \frac{\partial\hat{\mathbf{f}}}{\partial\boldsymbol{\theta}} = \frac{\partial \mathcal{J}}{\partial\hat{\mathbf{f}}} \cdot \frac{\partial\hat{\mathbf{f}}}{\partial\boldsymbol{\theta}}.\label{eq:chain95rule}\tag{14}\] The first term, \(\frac{\partial \mathcal{J}}{\partial \hat{\mathbf{f}}}\), on the right-hand side is obtained from Eq. 10 after solving the adjoint equations (Eq. 8 ) with the support of Eq. 9 . This yields an analytical gradient of the cost function \(\mathcal{J}\) with respect to \(\hat{\mathbf{f}}\). The second term, \(\frac{\partial \hat{\mathbf{f}}}{\partial \boldsymbol{\theta}}\), represents the gradient of the GNN’s output with respect to its \(\boldsymbol{\theta}\) trainable parameters, which can be computed efficiently using the PyTorch Geometric automatic differentiation.
These two gradients – one analytically derived and discretised using FEniCS, and the other numerically computed via automatic differentiation in PyTorch Geometric [52] – are combined to complete the chain rule (Eq. 14 ) and obtain the full derivative that is used to update the GNN learnable parameters via gradient descent. The full forward-backward process is repeated iteratively until convergence, i.e., when the cost function \(\mathcal{J}\) falls below a predefined threshold.

5.2 On the pre-training step↩︎

A crucial step in the algorithm is the GNN model’s pre-training phase. We can motivate this choice from two different perspectives. From a theoretical Bayesian viewpoint, the objective is to infer the posterior distribution of the GNN parameters \(\theta\) given observations of the mean velocity field \(\bar{\mathbf{u}}\), i.e., \(p(\theta \mid \bar{\mathbf{u}})\). Since the forcing field \(\mathbf{f}\) is not observed directly, but acts as a latent variable linking \(\theta\) to \(\bar{\mathbf{u}}\) via the RANS equations, we introduce \(\mathbf{f}\) as an auxiliary variable and marginalise over it. This yields: \[p(\theta \mid \bar{\mathbf{u}}) \propto \int p(\bar{\mathbf{u}} \mid \mathbf{f}) \, p(\mathbf{f} \mid \theta) \, p(\theta) \, d\mathbf{f},\] where \(p(\bar{\mathbf{u}} \mid \mathbf{f})\) models the likelihood of the observed mean flow given a candidate forcing field; \(p(\mathbf{f} \mid \theta)\) is the GNN model predicting \(\mathbf{f}\) from input features and \(p(\theta)\) is the prior over the model parameters, which we assume to be uniform in the absence of other information. This decomposition highlights the role of \(\mathbf{f}\) as a latent intermediate quantity through which the model learns to match the observed \(\bar{\mathbf{u}}\). Pretraining the GNN on DNS-based samples of \(\mathbf{f}\) provides a meaningful initialisation for \(p(\mathbf{f} \mid \theta)\), thus guiding the posterior inference on \(\theta\) in a more stable and efficient way during the later reconstruction of \(\bar{\mathbf{u}}\). Moreover, the GNN’s weights and biases are set using a default initialisation [58], meaning that early predictions from the GNN are non-physical and cannot reliably be used in the forward step, where the forcing term is introduced into the RANS equations to compute the mean flow (Sec. 5.1). In fact, the solution to the RANS problem may not exist if the initial guess for the forcing term, \(\hat{\mathbf{f}}\), is too far from a physically meaningful value. Thus, from the numerical viewpoint, the pre-training phase stabilises the GNN’s output and mitigates this issue, making the prediction of the forcing stress term \(\hat{\mathbf{f}}\) suitable for integration into the RANS equations.
The pre-trained model is obtained through pure supervised learning [44], where the mean flow \(\overline{\mathbf{u}}\) (and Reynolds number \(Re\)) are used as input, and the forcing stress term \(\mathbf{f}\) from DNS data is the target. The loss function used in this phase is the Mean Squared Error (MSE) loss \(\mathcal{L}\), already discussed in Sec. 4.3. where \(n_i\) is the number of nodes in the GNN. The number of epochs required to reach the desired closure term accuracy is not fixed a priori, but is determined based on the convergence behaviour of the GNN during the pre-training phase. In particular, the transition to the main training phase occurs only when the GNN has learned to produce physically plausible forcing terms \(\hat{\mathbf{f}}\), which allow the RANS equations to be solved via the FEM framework. To ensure consistency across test cases, a standard protocol is adopted: for simple reference scenarios, such as those presented in Section 6.1, the pre-training is fixed at 500 epochs. For more complex cases, which involve larger datasets, the pre-training phase is extended to 1000 epochs. This choice reflects the increased training effort required to reach comparable levels of accuracy in more challenging settings.
In this context, the closure term accuracy refers to the level of precision necessary for the GNN to generate predictions that allow the FEM solver to successfully solve the RANS equations. Throughout the pre-training phase, the GNN’s predictions are periodically evaluated by solving a test FEM step. If the solver converges and accurately resolves the RANS equations using the GNN-predicted closure term, the pre-training phase is considered complete. This ensures that the GNN has learned a reliable and accurate representation of the closure term, making it suitable for the full training process.

5.3 On the loss function↩︎

During the pre-training step (Sec. 5.2), the GNN model is updated using a loss function designed to align the model’s predictions with the available DNS data. As previously mentioned, this phase serves as a warm-up for the subsequent main training (Sec. 5.1). However, when pre-training ends and the main training begins, a different loss function is adopted, as can be seen by comparing Eq. 13 with Eq. 5 . This change may negatively affect convergence and destabilise the training process, as the two optimisation landscapes can be quite different. To mitigate this risk, both loss functions are retained during the main training phase and combined using a weight coefficient, \(\beta\). In particular, the loss function \(\mathcal{L}\) (Eq. 13 ), associated with supervised pre-training, continues to enforce a data-driven alignment and ensures "continuity" in the optimisation process. On the other hand, the term \(\mathcal{J}\) (Eq. 5 ), corresponding to the physics-constrained loss, minimises the mean flow reconstruction error. Thus, the global loss function for the main training loop (Sec. 5.1) evolves as: \[\mathcal{H} = (1-\beta)\mathcal{L} + \beta\mathcal{J} = (1-\beta)\left(\frac{1}{n} \sum_{i=1}^{n} (\mathbf{f}_i - \hat{\mathbf{f}}_i)^2\right) + \beta \left ( \frac{1}{2}\int_\Omega \left(\bar{\mathbf{m}} - \mathcal{M}(\hat{\overline{\mathbf{u}}})\right)^2 d\Omega \right ). \label{eq:loss95global}\tag{15}\] This strategy ensures a smooth transition between the two optimisation steps by adjusting the relative importance of the pre-training and main training loss functions.

The next section discusses the results, showing that the converged GNN model effectively predicts a forcing term \(\hat{\mathbf{f}}\) that aligns with the ground truth, \(\mathbf{f}\) allowing an effective learned model to reconstruct the mean flow \(\overline{\mathbf{u}}\).

6 Results↩︎

Figure 5: Training Dataset: 1 mean flow-forcing pair (Re = 150); the GNN input of the mean flow from the ground truth is shown in (a); (b) Loss curves for the pure supervised approach (orange line), the proposed PhyCo-GNN method (blue line), and the pre-training phase (red line). Shadow colors highlight standard deviations computed over 5 independent training runs with different parameter initialisations. The two horizontal dotted lines indicate the minimum values of the supervised and proposed methods. (c) Reconstructed mean flow obtained with the PhyCo-GNN approach. (d) Contour plot of the reconstruction error difference between the pure supervised approach and PhyCo-GNN

In this section, we present the improvements achieved using the PhyCo-GNN data assimilation scheme for reconstructing the mean flow field \(\overline{\mathbf{u}}\). The tests are conducted across several scenarios, with a particular focus on reconstructing the mean flow from sparse measurements, noisy probes, and incomplete flow fields (inpainting). The models are compared with the pure supervised learning method, which serves as a baseline. The supervised learning approach is thoroughly discussed in [44]: the GNN model is trained solely by learning the forcing stress from the DNS data. Therefore, the cost function is chosen to minimise the discrepancy between the predicted forcing stress and the ground truth, without incorporating any constraints based on the system’s physics. The mean flow \(\overline{\mathbf{u}}\) is then computed by using the forcing stress modelled with the GNN as input to the RANS equations in Eq. 3 . In contrast, the hybrid data assimilation scheme introduced in this work incorporates a physics-based constraint during the training process of the GNN. To compare the two methods, we evaluate their training curves (after the pre-training phase) by identifying the minimum loss values achieved by each model during training. Unless otherwise stated, all test cases use the full velocity field from DNS as input; the sparse probe setting applies exclusively to the test in Sec. 6.3. As a metric for comparison, we consider the percentage improvement, defined as: \[\mathcal{I}(\%) = \dfrac{min( \mathcal{J}_{\text{S}}) - min( \mathcal{J}_{\text{PC}})}{min( \mathcal{J}_{\text{S}})} \cdot 10^2, \label{eq:percentage95improvement}\tag{16}\] where \(min( \mathcal{J}_{\text{S}})\) and \(min( \mathcal{J}_{\text{PC}})\) represent the minimum values of the loss function for mean flow reconstruction (Eq. 5 ) in the supervised learning method and the adjoint-based PhyCo-GNN method, respectively. In the following, we introduce different learning tasks, focusing on the technical features of the method and discussing the improvements achieved in terms of mean flow reconstruction.

6.1 Reference cases↩︎

Figure 6: Training Dataset: 1 mean flow-forcing pair at Re = 90; the mean flow is shown in (a) and is used as GNN input; legend for (b), (c), and (d) as in Fig. 5

The first test case we consider is the reconstruction of the flow field when the input to the GNN is the complete mean flow \(\overline{\mathbf{u}}\) and Reynolds number \(Re\) over the entire computational domain \(\Omega\). This test case is introduced as a reference for the method. Throughout this section, the mapping operator \(\mathcal{M}\) used in the definition of the cost function is taken as the identity operator, implying that the comparison is performed over the full velocity field without any spatial filtering or subsampling.
We consider two cases of increasing complexity. The first case involves flow past a 2D cylinder at a Reynolds number of \(Re = 150\). This case is well-documented in the literature, and its mean flow is shown in Fig. 5\(a\). The training dataset includes only the mean flow \(\overline{\mathbf{u}}\) (used as input to the GNN) and its corresponding forcing term \(\mathbf{f}\) (the GNN target). The training curves in Fig. 5\(b\) show that, starting from the pre-training phase, the implementation of the approach described in this paper leads to a substantial improvement in mean flow reconstruction. Specifically, the improvement reaches a value of \(\mathcal{I}=58.59\%\).

The second case involves a two side-by-side cylinder configuration, also known in the literature as the ‘flip-flop’ case, at a Reynolds number of \(Re = 90\) [59]. Its mean flow is shown in Fig. 6\(a\). The training curves for this case in Fig. 6\(b\) demonstrate an even more pronounced improvement, with a reduction of \(\mathcal{I}=82.90\%\) in the loss curve. The results not only indicate the broad adaptability of the PhyCo-GNN but also show how, in more complex scenarios, the underlying physics and governing equations play a crucial role in further enhancing the accuracy of the GNN model’s predictions.

6.2 Generalisation↩︎

In this section, we test the generalisation capabilities of the model obtained using the PhyCo-GNN scheme. The training dataset consists of three cases involving a 2D cylinder at Reynolds numbers of \(Re = [90, 110, 130]\). In contrast, the validation dataset includes data points not present in the training set, corresponding to simulations of the flow around a 2D cylinder at Reynolds number \(Re = 120\) for interpolation testing, and at \(Re = 150\) for testing the extrapolation capabilities. Throughout this generalisation test, the mapping operator \(\mathcal{M}\) used in the cost function remains the identity operator.

Figure 7: Generalisation test – training dataset: 3 meanflow-forcing pairs at Re = [90, 110, 130]; Validation Dataset: 2 meanflow-forcing pairs at Re = [120, 150]; (a) ground truth mean flow at Re = 120 from the validation dataset, used as GNN input; legend for (b), (c), and (d) as in Fig. 5

In Fig. 7\(a\), the ground truth mean flow \(\overline{\mathbf{u}}\) for the \(Re = 120\) case is shown. Based on the validation cases, we observe an improvement in the mean flow reconstruction, with an average improvement of \(\mathcal{I}=73.27\%\) across the entire validation dataset. Specifically, we achieve an improvement of \(\mathcal{I}=78.96\%\) for the interpolation case at \(Re = 120\), and \(\mathcal{I}=13.96\%\) for the extrapolation case at \(Re = 150\). The improvement on the training cases is \(\mathcal{I}=40.16\%\) on average across the entire training dataset.

The primary objective of this test case is to demonstrate that the presented approach enhances the generalisation capabilities of the GNN model. To ensure clarity in our analysis, this generalisation test case is intentionally separated from the others. This separation allows for a focused evaluation of each individual test case, targeting the specific goals of those tests without introducing confounding variables related to generalisation. It should be noted that a more extensive assessment of the GNN’s ability to generalise across a range of geometric configurations – such as variations in bluff body shapes or positions – is beyond the scope of this study. These aspects have been examined in detail by Quattromini et al. [44], in combination with active learning strategies. In the present work, we focus on a simplified setting with limited generalisation in order to clearly evaluate the role of the physics-constrained training scheme in predicting the mean flow.

6.3 Sparse Measurements↩︎

The learning task presented here involves the reconstruction of the mean flow over the entire computational domain using measurements from randomly distributed probes as input for the GNN. In this case, the mapping operator \(\mathcal{M}\) appearing in the cost function is defined as the projection onto the sparse set of probe locations, so that the comparison between predicted and reference fields is performed only at the measurement points. The numerical treatment of the term forcing the adjoint equation – based on the sparse measurements – follows the one included in [23].

Figure 8: Sparse measurements – training dataset: 250 randomly distributed probes on 6 mean flow-forcing pairs at Re = [90, 110, 130], with two instances for each Re); (a) Random probes positioning on the mean flow; legend for (b), (c), and (d) as in Fig. 5

The training dataset consists of two simulations of the flow past a cylinder for each Reynolds number in the range \(Re = [90, 110, 130]\), resulting in six cases. For each case, \(450\) probes are placed along the mean flow stream, uniformly distributed across the entire computational domain \(\Omega\). Subsequently, \(200\) of these probes are randomly removed, leaving a sparse set of \(250\) probes. This sparse set of measurements of the mean flow \(\overline{\mathbf{u}}\) is used as input to the GNN, while the output prediction is compared with the corresponding forcing stress tensor from the DNS ground truth. Fig. 8\(a\) shows the positioning of the random probes on the mean flow, while Fig. 8\(b\) shows the average training curves across the training dataset.

In this case, we demonstrate an improvement in mean flow reconstruction across all the training cases, with an average improvement of \(\mathcal{I} = 55.09\%\). This result highlights the robustness of the proposed approach in scenarios with sparse and randomly distributed measurements.

6.4 Denoising↩︎

In this test case, the input mean flow field is perturbed with Gaussian noise. The probability density function used for the Gaussian distribution to generate the noise is given by: \[\psi\left(z\right) = \frac{1}{\sigma \sqrt{2\pi}}e^{\frac{-\left(z-\mu\right)^2}{2\sigma^2}} \label{eq:pdf95gaussian}\tag{17}\] where \(z\) is the random variable, \(\mu\) is the mean value of the normal distribution, and \(\sigma\) represents its standard deviation. In this case, we assume \(\mu = 0\), i.e., a standard normal distribution. The mapping operator \(\mathcal{M}\), in this case, remains the identity operator. The training dataset consists of three cases of cylinder flows at Reynolds numbers \(Re = [90, 110, 130]\), each perturbed with Gaussian noise of fixed standard deviation \(\sigma = 0.4\). Fig. 9\(a\) shows the effect of \(\sigma = 0.4\) Gaussian noise on the mean flow (at \(Re = 130\)), while Fig. 9\(b\) presents the accuracy of the mean flow reconstruction. The goal is to remove the Gaussian noise and accurately reconstruct the denoised mean flow field. Our approach demonstrates an improvement of \(\mathcal{I} = 47.52\%\) on the training dataset, on average, across all cases.
Although not shown here, we also conducted additional tests at fixed Reynolds number (\(Re = 130\)), with varying noise levels \(\sigma = [0.2, 0.4, 0.6]\). The results confirmed superior performance of the proposed approach compared to purely supervised learning, highlighting its robustness to different noise intensities.

Figure 9: Denoising – training dataset: 3 mean flow-forcing pairs at Re = [90, 110, 130], perturbed with Gaussian noise; (a) mean flow at Re = 130; legend for (b), (c), and (d) as in Fig. 5
Figure 10: Inpainting – training dataset: 3 mean flow-forcing pairs at Re = [90, 110, 130], with randomly located patching masks; (a) mean flow at Re = 110; legend for (b), (c), and (d) as in Fig. 5

6.5 Inpainting↩︎

In this test, masking patches are randomly applied to the input mean flow field. The training dataset consists of three cases of a cylinder obstacle at Reynolds numbers \(Re = [90, 110, 130]\), each with different patch locations (Fig. 10\(a\)). In this setup, the mapping operator \(\mathcal{M}\) used in the cost function excludes the masked regions, meaning that the comparison between the predicted and reference fields is performed only over the unmasked areas of the domain. The goal is to reconstruct the mean flow field by filling in the missing patches. The approach shows improvements on the training cases with an average increase of \(\mathcal{I}=41.73\%\), successfully restoring the missing portions of the field and enhancing the overall reconstruction accuracy.

A summary of the results is shown in Tab. 1, where the relative errors between the reconstructed mean flow and the ground truth are reported for both the pure supervised technique based on the GNN models (see also [44]) and the data-assimilation scheme PhyCo-GNN. The latter outperforms vanilla supervised learning in all cases as already shown in the previous paragraphs.

Table 1: Summary of the reconstruction errors for each test considered in Sec. 6. The cases listed correspond to those presented in Figures 510. For each case, a comparison is made between the baseline method based on supervised learning and the performance of PhyCo-GNN, with the errors reported as relative errors in the \(2\)-norm.
Cases Reconstruction error
supervised Phy-Co GNN
Reference case, Re=150 0.0190 0.0126
Reference case, Re=90 (flip-flop) 0.0910 0.0843
Generalisation, Re=120 0.0087 0.0069
Sparse Measurements, Re=110 0.0134 0.0092
Gaussian Noise, Re=130 0.0200 0.0103
Inpainting, Re=110 0.0056 0.0026

7 Conclusion↩︎

In this work, we introduced a direct-adjoint data assimilation scheme based on the Reynolds-Averaged Navier-Stokes (RANS) equations, where the closure model is based on Graph Neural Networks (GNNs). The resulting scheme, named Physics-Constrained Graph Neural Network (PhyCo-GNN), demonstrated the potential of combining machine learning techniques with well-established physical principles to improve the accuracy and reliability of mean flow reconstruction. More specifically, the forcing term of the RANS equations is modelled using a GNN model informed by gradients obtained through auto-differentiation of the neural network and gradients computed analytically based on the adjoint equations associated with the iterative process. On one hand, these gradients ensure that the learned model adheres to the underlying physics. On the other hand, the closure model benefits from the flexibility of GNNs in handling unstructured meshes, making the framework particularly well-suited for complex geometries often encountered in Computational Fluid Dynamics (CFD). Finally, the combination of GNN and adjoint-based methods mitigates the dependency on large datasets, as the inclusion of physics reduces the need for extensive training data to achieve reliable predictions.

We tested the PhyCo-GNN framework across several scenarios, with a particular focus on reconstructing the mean flow from sparse measurements, noisy probes, and incomplete flow fields (inpainting). The models are compared with a supervised learning method, which serves as a benchmark, where the learning process of the GNN model is not constrained by the adjoint but solely relies on numerical data. In particular, we consider as the reference case the mean flow reconstruction of flows past a 2D cylinder at a Reynolds number of \(Re = 150\) and a two-cylinder configuration at \(Re = 90\). Other tests include generalisation by considering interpolation and extrapolation with respect to unseen Reynolds numbers during the training process, reconstruction from noisy inputs, and inpainting, which involves the reconstruction of the mean flow field by filling in the missing patches. For all test cases, the PhyCo-GNN approach showed substantial improvements compared to the reference method based on pure supervised learning.

Future work will explore the extension of the presented framework to more complex scenarios, including higher Reynolds number regimes, three-dimensional flows, and experimental datasets, which are often characterised by sparse, noisy, or incomplete measurements. While the methodological structure of the PhyCo-GNN framework is, in principle, compatible with such configurations, the transition to large-scale 3D domains poses significant computational challenges. These include the increasing cost of mesh resolution, memory usage during GNN training, and the resource demands of adjoint-based PDE solvers. Addressing these limitations will require adopting scalable strategies such as domain decomposition for graph-based learning (a review can be found in Dolean et al. [60]), modular and multi-scale GNN architectures, and multi-GPU distributed training pipelines. In parallel, the adoption of more efficient FEM backends and the exploration of alternative network architectures, such as physics-informed transformers or recurrent graph networks, may further enhance the expressiveness and applicability of the framework.

8 Hyperparameters↩︎

The performance of a neural network is determined by its hyperparameters, which are set before training and influence both the model’s architecture and the training process. These parameters define the network’s ability to approximate complex functions, balance computational efficiency, and ensure stable training.

To identify the optimal set of hyperparameters, we used the open-source optimisation library Optuna [61]. Optuna combines efficient parameter space exploration with advanced pruning strategies, dynamically pruning unpromising trials to reduce computational costs while maximising performance. The final hyperparameters selected for the learning tasks in this study are:

  • Embedded dimension: \(d_h = 35\). The size of the hidden feature space for each node in the GNN. A higher dimension enables richer representations of node features, but excessively large values increase computational cost without necessarily improving performance.

  • Number of GNN layers: \(k = 40\). This depth ensures global information can propagate across the graph, while avoiding over-smoothing.

  • Update relaxation weight: \(\alpha = 6 \times 10^{-1}\). This parameter balances the influence of new and old information during feature updates.

  • Loss function weight: \(\gamma = 0.1\). Controls the weighting of loss terms from different layers in the GNN during training. Higher weights on later layers emphasise long-range dependencies in the training process.

  • Learning rate: \(LR = 3 \times 10^{-3}\). This optimises the step size for weight updates, balancing convergence speed and stability.

These hyperparameters were optimised to achieve a balance between model accuracy and computational efficiency. It is important to note that these hyperparameters are task-dependent and, in theory, should be re-optimised for each specific learning task to ensure optimal performance. However, re-optimisation for the different learning tasks in this study showed that the hyperparameters remained largely consistent. This suggests that the GNN architecture exhibits inherent robustness to variations in hyperparameters within the range of learning tasks addressed here.

References↩︎

[1]
S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52:477–508, 2020.
[2]
R. Vinuesa and S. L. Brunton. Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci., 2(6):358–366, 2022.
[3]
C. J. Lapeyre, A. Misdariis, N. Cazard, D. Veynante, and T. Poinsot. Training convolutional neural networks to estimate turbulent sub-grid scale reaction rates. Combust. Flame, 203:255–264, 2019.
[4]
Z. Zhou, G. He, and X. Yang. Wall model based on neural networks for LES of turbulent flows over periodic hills. Phys. Rev. Fluids, 6(5):054610, 2021.
[5]
H. J. Bae and P. Koumoutsakos. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun., 13(1):1443, 2022.
[6]
J. Ling and J. Templeton. Evaluation of machine learning algorithms for prediction of regions of high Reynolds-averaged Navier Stokes uncertainty. Phys. Fluids, 27(8):085103, 2015.
[7]
A. P. Singh and K. Duraisamy. Using field inversion to quantify functional errors in turbulence closures. Phys. Fluids, 28(4):045110, 2016.
[8]
H. Eivazi, M. Tahani, P. Schlatter, and R. Vinuesa. Physics-informed neural networks for solving Reynolds-Averaged Navier–Stokes equations. Phys. Fluids, 34(7):075117, 2022.
[9]
M. Schmelzer, R. P. Dwight, and P. Cinnella. Discovery of algebraic Reynolds-stress models using sparse symbolic regression. Flow Turbul. Combust., 104:579–603, 2020.
[10]
K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51:357–377, 2019.
[11]
R. McConkey, E. Yee, and F.-S. Lien. On the generalizability of machine-learning-assisted anisotropy mappings for predictive turbulence modelling. Int. J. Comput. Fluid Dyn., 36(7):555–577, 2022.
[12]
J. Ling, A. Kurzawski, and J. Templeton. eynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807:155–166, 2016.
[13]
J. Cai, P.-E. Angeli, J.-M. Martinez, G. Damblin, and D. Lucor. Revisiting tensor basis neural network for Reynolds stress modeling: Application to plane channel and square duct flows. Comput. Fluids, page 106246, 2024.
[14]
J. Weatheritt and R. Sandberg. A novel evolutionary algorithm applied to algebraic modifications of the rans stress–strain relationship. J. Comput. Phys., 325:22–37, 2016.
[15]
Y. Zhao, H. D. Akolekar, J. Weatheritt, V. Michelassi, and R. D. Sandberg. turbulence model development using CFD-driven machine learning. J. Comput. Phys., 411:109413, 2020.
[16]
A. Beck and M. Kurz. A perspective on machine learning methods in turbulence modeling. GAMM Mitt., 44(1):e202100002, 2021.
[17]
P. Cinnella. Data-driven turbulence modeling. arXiv preprint arXiv:2404.09074, 2024.
[18]
O. Titaud, A. Vidard, and I. Souopgui. Assimilation of image sequences in numerical models. Tellus A: Dynamic Meteorology and Oceanography, 62(1):30–47, 2010.
[19]
I. M. Navon. Data assimilation for numerical weather prediction: a review. Data assimilation for atmospheric, oceanic and hydrologic applications, pages 21–65, 2009.
[20]
M. Meldi and A. Poux. A reduced order model based on kalman filtering for sequential data assimilation of turbulent flows. Journal of Computational Physics, 347:207–234, 2017.
[21]
S. Cheng, J.-P. Argaud, B. Iooss, D. Lucor, and A. Ponçot. Background error covariance iterative updating with invariant observation measures for data assimilation. Stochastic Environmental Research and Risk Assessment, 33(11):2033–2051, 2019.
[22]
L. Cordier, B. R. Noack, G. Tissot, G. Lehnasch, J. Delville, M. Balajewicz, G. Daviller, and R. K. Niven. Identification strategies for model-based control. Experiments in fluids, 54:1–21, 2013.
[23]
D. P. G. Foures, N. Dovetta, D. Sipp, and P. J. Schmid. A data-assimilation method for Reynolds-averaged Navier–Stokes-driven mean flow reconstruction. J. Fluid Mech., 759:404–431, 2014.
[24]
F. Giannetti and P. Luchini. Structural sensitivity of the first instability of the cylinder wake. Journal of Fluid Mechanics, 581:167–197, 2007.
[25]
O. Marquet, D. Sipp, and L. Jacquin. Sensitivity analysis and passive control of cylinder flow. J. Fluid Mech., 615:221–252, 2008.
[26]
P. Luchini, F. Giannetti, and J. O. Pralits. Structural sensitivity of the finite-amplitude vortex shedding behind a circular cylinder. In IUTAM Symp. Unsteady Separated Flows Their Control, pages 151–160. Springer, 2009.
[27]
P. J. Schmid and D. S. Henningson. Optimal energy density growth in Hagen–Poiseuille flow. J. Fluid Mech., 277:197–225, 1994.
[28]
S. Cherubini, J.-C. Robinet, A. Bottaro, and P. De Palma. Optimal wave packets in a boundary layer and initial phases of a turbulent spot. J. Fluid Mech., 656:231–259, 2010.
[29]
L. Magri. Adjoint methods as design tools in thermoacoustics. Appl. Mech. Rev., 71(2):020801, 03 2019.
[30]
A. Giannotta, S. Cherubini, and P. De Palma. Minimal energy thresholds for triggering in the rijke tube: the effect of the time delay. J. Fluid Mech., 938:A23, 2022.
[31]
F. Giannetti and P. Luchini. Leading-edge receptivity by adjoint methods. J. Fluid Mech., 547:21–53, 2006.
[32]
S. Cherubini, J.-C. Robinet, and P. De Palma. Nonlinear control of unsteady finite-amplitude perturbations in the blasius boundary-layer flow. J. Fluid Mech., 737:440–465, 2013.
[33]
O. Semeraro, J. O. Pralits, C. W. Rowley, and D. S. Henningson. Riccati-less approach for optimal control and estimation: an application to two-dimensional boundary layers. J. Fluid Mech., 731:394–417, 2013.
[34]
P. Luchini and A. Bottaro. Adjoint equations in stability analysis. Annu. Rev. Fluid Mech., 46(1):493–517, 2014.
[35]
P. S. Volpiani, M. Meyer, L. Franceschini, J. Dandois, F. Renac, E. Martin, O. Marquet, and D. Sipp. Machine learning-augmented turbulence modeling for RANS simulations of massively separated flows. Phys. Rev. Fluids, 6(6):064607, 2021.
[36]
C. A. Ströfer and H. Xiao. End-to-end differentiable learning of turbulence models from indirect observations. TAML, 11(4):100280, 2021.
[37]
Y. Patel, V. Mons, O. Marquet, and G. Rigas. Turbulence model augmented physics-informed neural networks for mean-flow reconstruction. Phys. Rev. Fluids, 9(3):034605, 2024.
[38]
W. L. Hamilton. Graph representation learning. Synth. Lect. Artif. Intell. Mach. Learn., 14(3):1–159, 2020.
[39]
A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia. Graph networks as learnable physics engines for inference and control. In Int. Conf. Mach. Learn., pages 4470–4479, 2018.
[40]
K. Shukla, M. Xu, N. Trask, and G. E. Karniadakis. Scalable algorithms for physics-informed neural and graph networks. Data-Centric Eng., 3:e24, 2022.
[41]
M. Lino, S. Fotiadis, A. A. Bharath, and C. D. Cantwell. Current and emerging deep-learning methods for the simulation of fluid dynamics. Proc. R. Soc. A, 479(2275):20230058, 2023.
[42]
A. P. Toshev, G. Galletti, J. Brandstetter, S. Adami, and N. A. Adams. Learning lagrangian fluid mechanics with E(3)-equivariant graph neural networks. In Int. Conf. Geom. Sci. Inf., pages 332–341. Springer, 2023.
[43]
D. Dupuy, N. Odier, C. J. Lapeyre, and D. Papadogiannis. Modeling the wall shear stress in large-eddy simulation using graph neural networks. Data-Centric Eng., 4:e7, 2023.
[44]
M. Quattromini, M. A. Bucci, S. Cherubini, and O. Semeraro. Active learning of data-assimilation closures using graph neural networks. Theor. Comput. Fluid Dyn., 2025.
[45]
D. C. Wilcox. Turbulence modeling for CFD, volume 2. DCW Industries La Canada, CA, 1998.
[46]
S. Wallin and A. V. Johansson. An explicit algebraic Reynolds stress model for incompressible and compressible turbulent flows. J. Fluid Mech., 403:89–132, 2000.
[47]
R. D. Cécora, R. Radespiel, B. Eisfeld, and A. Probst. Differential Reynolds-stress modeling for aeronautics. AIAA J., 53(3):739–755, 2015.
[48]
M. Provansal, C. Mathis, and L. Boyer. Bénard-von kármán instability: transient and forced regimes. J. Fluid Mech., 182:1–22, 1987.
[49]
A. Roshko. On the drag and shedding frequency of two-dimensional bluff bodies. Technical report, 1954.
[50]
N. Deng, B. R. Noack, M. Morzyński, and L. R. Pastur. Low-order model for successive bifurcations of the fluidic pinball. J. Fluid Mech., 884:A37, 2020.
[51]
M. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells. The FEniCS project version 1.5. Arch. Numer. Softw., 3(100), 2015.
[52]
M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
[53]
I. Gühring, M. Raslan, and G. Kutyniok. Expressivity of deep neural networks. arXiv preprint arXiv:2007.04759, 34, 2020.
[54]
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989.
[55]
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. assachusetts Institute of Technology, 2016.
[56]
M. Nastorg. Scalable GNN Solutions for CFD Simulations. Thesis, Université Paris-Saclay, Apr. 2024.
[57]
B. Donon, Z. Liu, W. Liu, I. Guyon, A. Marot, and M. Schoenauer. Deep statistical solvers. Adv. Neural Inf. Process. Syst., 33:7910–7921, 2020.
[58]
K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proc. IEEE Int. Conf. Comput. Vis., pages 1026–1034, 2015.
[59]
M. Carini, F. Giannetti, and F. Auteri. On the origin of the flip–flop instability of two side-by-side cylinder wakes. J. Fluid Mech., 742:552–576, 2014.
[60]
V. Dolean, A. Heinlein, S. Mishra, and B. Moseley. Multilevel domain decomposition-based architectures for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 429:117116, 2024.
[61]
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proc. of the 25th ACM SIGKDD Int. Conf. on Knowl. Discov. & Data Mining, pages 2623–2631, 2019.