November 05, 2025
We present a bifidelity Karhunen–Loève expansion (KLE) surrogate model for field-valued quantities of interest (QoIs) under uncertain inputs. The approach combines the spectral efficiency of the KLE with polynomial chaos expansions (PCEs) to preserve an explicit mapping between input uncertainties and output fields. By coupling inexpensive low-fidelity (LF) simulations that capture dominant response trends with a limited number of high-fidelity (HF) simulations that correct for systematic bias, the proposed method enables accurate and computationally affordable surrogate construction. To further improve surrogate accuracy, we form an active learning strategy that adaptively selects new HF evaluations based on the surrogate’s generalization error, estimated via cross-validation and modeled using Gaussian process regression. New HF samples are then acquired by maximizing an expected improvement criterion, targeting regions of high surrogate error. The resulting BF-KLE-AL framework is demonstrated on three examples of increasing complexity: a one-dimensional analytical benchmark, a two-dimensional convection-diffusion system, and a three-dimensional turbulent round jet simulation based on Reynolds-averaged Navier–Stokes (RANS) and enhanced delayed detached-eddy simulations (EDDES). Across these cases, the method achieves consistent improvements in predictive accuracy and sample efficiency relative to single-fidelity and random-sampling approaches.
Computational models are now indispensable tools for studying complex physical phenomena and engineering systems. Such models often involve solving systems of governing equations (e.g., partial differential equations) that capture detailed physics at high spatial and temporal resolutions. When these models resolve the underlying processes with high accuracy, they are referred to as high-fidelity (HF) models. HF simulations can provide predictive accuracy suitable for decision-making and design; however, they are often prohibitively expensive for many-query tasks such as design optimization, inverse problems, or uncertainty quantification (UQ) under random inputs.
In contrast, low-fidelity (LF) models are computationally inexpensive and approximate the true system behavior, typically through simplifying assumptions or reduced numerical resolution. Relying solely on LF models can lead to systematic biases in the quantities of interest (QoIs), while exclusive reliance on HF models is computationally infeasible. Bifidelity modeling seeks a compromise by leveraging both: inexpensive LF evaluations to capture the bulk structure of the response and a limited number of HF simulations to correct the residual bias. This idea extends naturally to multi-fidelity hierarchies, but we focus here on the bifidelity case. Examples of multi-fidelity surrogate modeling include [1]–[4], with broader surveys available in [5], [6].
In this work, we propose a bifidelity surrogate modeling framework for field quantities (e.g., spatially and/or temporally varying outputs such as velocity fields) based on the Karhunen–Loève expansion (KLE) (see, e.g., [7]). The KLE provides a spectral decomposition of stochastic processes into orthogonal spatial modes and random coefficients, allowing a compact representation of correlated fields. To explicitly represent inputs in terms of their physical sources of uncertainty, we couple the KLE with polynomial chaos expansions (PCEs) (see, e.g., [7]–[12]), which express the random coefficients as polynomial functions of independent random variables. The combined KLE-PCE structure enables efficient representation, propagation, and correction of uncertainty across fidelity levels.
We then introduce an active learning framework [13], [14] that automatically identifies where new HF simulations should be conducted to best improve the surrogate. Specifically, we estimate the surrogate’s generalization error through cross-validation and fit these errors using Gaussian process (GP) regression [15], [16]. The GP provides a smooth uncertainty field over the input space, from which new HF samples are selected by maximizing an expected improvement acquisition function [17], [18]. This approach enables data-efficient surrogate refinement in a greedy manner, but does not perform resource allocation between the HF and LF models that is done in variance-minimization approaches such as multi-level, multi-fidelity Monte Carlo and approximate control variate methods [19]–[23].
The resulting framework—we call it BF-KLE-AL (Bifidelity Karhunen–Loève Expansion with Active Learning)—is demonstrated on three progressively complex examples: (a) a one-dimensional (1D) analytical benchmark with known truth; (b) a two-dimensional (2D) convection-diffusion problem; and (c) a three-dimensional (3D) turbulent jet flow simulation using Reynolds-averaged Navier–Stokes and enhanced delayed detached-eddy simulation solvers. Across these examples, we evaluate the framework’s accuracy, efficiency, and capability for forward UQ.
Our main contributions are as follows.
We design a new bifidelity surrogate model based on the combination of KLE and PCE for representing spatially or temporally distributed QoIs.
We form an active learning algorithm that iteratively refines the bifidelity surrogate by selecting informative HF samples, guided by cross-validated error modeling through GP.
We demonstrate the framework’s ability to achieve sample-efficient surrogate construction and improved predictive performance on both synthetic benchmarks and realistic turbulent flow simulations.
All related code, scripts, and data supporting this study are openly available at https://github.com/aniketjivani/KLE_UQ_Final.
The paper is organized as follows. Section 2 reviews the theoretical background on KLEs and PCEs. Section 3 presents the proposed bifidelity KLE formulation and active learning strategy. Section 4 reports numerical experiments and analyses on the benchmark and jet-flow test cases. Section 5 concludes with key findings and discusses directions for future work.
Consider a probability space \((\Omega,\mathcal{F},\mathbb{P})\), where \(\Omega\) is the sample space, \(\mathcal{F}\) is a \(\sigma\)-algebra, and \(\mathbb{P}\) is a probability measure. Let \(\theta : \Omega \to \mathbb{R}^{n_{\theta}}\) be a random vector representing all uncertain input parameters of a model. Suppose the model output is a real-valued stochastic process \(y : \mathcal{X}\times \mathbb{R}^{n_\theta} \to \mathbb{R}\) indexed by \({x}\in\mathcal{X}\subseteq \mathbb{R}^{n}\) (e.g., spatial or temporal location). Then, \[\begin{align} y(x,\theta) = \mu(x) + y_0(x,\theta), \label{e:centering} \end{align}\tag{1}\] where \(\mu(x)=\mathbb{E}[y(x,\cdot)]\) is the mean function, and \(y_0(x,\theta)\) is the zero-mean component after centering. In practice, we estimate and subtract the empirical mean from data when modeling the process, and add it back when making predictions. All theoretical development below pertains to the centered process \(y_0\), which we focus on without loss of generality.
If \(\mathcal{X}\) is bounded and \(y_0\) is square-integrable and mean-square continuous, it admits a spectral expansion known as the KLE (see, e.g., [7]): \[\begin{align} \label{e:inf95sum95kle} y_0(x, \theta) = \sum_{k=1}^{\infty} \sqrt{\lambda_k} q_k(x) \zeta_k(\theta), \end{align}\tag{2}\] where \(\zeta_k(\theta)\) are mutually uncorrelated random variables with zero mean and unit variance: \[\begin{align} \mathbb{E}[\zeta_k] = 0, \qquad \mathbb{E}[\zeta_k \zeta_l] = \delta_{kl}. \end{align}\] The eigenvalues \(\lambda_1\geq \lambda_2 \geq \ldots \rightarrow 0\) and orthonormal eigenfunctions \(q_k(x)\) are obtained by solving the homogeneous Fredholm equation of the second kind: \[\begin{align} \int_{\mathcal{X}}C(x,x') q_k(x') \,\text{d}x' = \lambda_k q_k(x),\label{e:eigenvalue} \end{align}\tag{3}\] with \[\begin{align} C(x,x') =\mathbb{E}\left[y_0(x, \cdot) y_0(x', \cdot)\right] \end{align}\] as the covariance function. The modal coefficients \(\zeta_k(\theta)\) are obtained by projection, leveraging the orthonormality of \(q_k(x)\): \[\begin{align} \zeta_k(\theta) = \frac{1}{\sqrt{\lambda_k}} \int_{\mathcal{X}} y_0(x,\theta) q_k(x)\, \text{d}x. \label{e:zeta95exact} \end{align}\tag{4}\] The KLE is optimal in the mean-squared error sense: truncating after \(k_t\) terms yields the best possible rank-\(k_t\) approximation.
To compute a KLE numerically, we start with a training set \(\{y(x_j,\theta^{(n)})\}\) of model simulations. where \(n=1,\ldots,N\) indexes the sampled parameter realizations, and \(j=1,\ldots,n_g\) indexes spatial or temporal grid points. We assume a common grid across all samples (or otherwise interpolate onto a common grid).
First, we estimate the mean from the training set: \[\begin{align} {\mu}(x_j) \approx \frac{1}{N} \sum_{n=1}^{N} y(x_j,\theta^{(n)}), \end{align}\] and center the dataset to obtain \(y_0(x_j,\theta^{(n)})=y(x_j,\theta^{(n)})-\mu({x_j})\). The covariance is empirically estimated using a Monte Carlo approximation: \[\begin{align} C(x_i,x_j) \approx C_{ij} = \frac{1}{N-1} \sum_{n=1}^{N} y_0(x_i,\theta^{(n)})y_0(x_j,\theta^{(n)}). \label{eq:32K95approx} \end{align}\tag{5}\]
To accommodate non-uniform grids, we introduce integration weights \(w_j>0\) reflecting the local volume or area associated with grid point \(x_j\). The discrete eigenproblem becomes: \[\begin{align} \sum_{j=1}^{n_{g}} w_j C_{ij} q_{k,j} = \lambda_k q_{k,i}. \label{e:discrete95eigenvalue} \end{align}\tag{6}\] Letting \(S=[y_0(x, \theta^{(1)}), \cdots, y_0(x, \theta^{(n)})]\) be the \(n_g \times N\) snapshot matrix of centered data, we have \(C=\frac{1}{N - 1} SS^{\top}\). In matrix form, 6 becomes: \[\begin{align} (CW)Q = Q\Lambda,\label{e:eig95matrix} \end{align}\tag{7}\] where \(W=\mathrm{diag}(w_1,\ldots,w_{n_g})\), \(Q\) contains the right-eigenvectors, and \(\Lambda\) contains the eigenvalues. The number of non-zero eigenvalues is \(\min(n_g,N)\).
To reduce memory and computational cost when \(n_g\) is large, we perform singular value decomposition (SVD) on \[\begin{align} B = \frac{1}{\sqrt{N-1}}W^{\frac{1}{2}}S, \end{align}\] yielding \(B=U \Sigma V^\top\). The eigenvalues of \(C\) are the squared singular values \(\Sigma^2\), and the eigenvectors are recovered as \(Q = W^{-\frac{1}{2}} U\). This way, only the \(n_g\times N\) matrix \(B\) needs to be formed instead of the \(n_g \times n_g\) matrix \(C\).
We truncate the expansion to the first \(k_t\) terms: \[\begin{align} y_0(x, \theta) \approx \tilde{y}_0(x, \theta) = \sum_{k=1}^{k_t} \sqrt{\lambda_k} q_k(x) \zeta_k(\theta).\label{e:KLE95truncate} \end{align}\tag{8}\] The truncation level \(k_t\) is typically selected based on the decay of \(\lambda_{k}\); for example, one may truncate when \(\lambda_{k}/\lambda_{1}\leq 0.1\). Alternative criteria include cumulative energy thresholds (e.g., retaining 99% of total variance) or cross-validation-based model selection.
Realizations of the KLE modes’ coefficients are computed as: \[\begin{align} \zeta_k^{(n)} = \frac{1}{\sqrt{\lambda_k}} \sum_{j=1}^{n_{g}} w_j y_0(x_j,\theta^{(n)}) q_{k,j}. \label{e:zeta95samples} \end{align}\tag{9}\] One may model each \(\zeta_k\) from these samples using a Gaussian approximation or kernel density estimation, enabling sample generation for UQ. However, this severs the explicit dependence on \(\theta\), making it difficult to condition on a specific parameter value or to merge KLEs across fidelity levels. To overcome this, we will introduce polynomial chaos expansion (PCE) to construct a surrogate mapping from \(\theta\) and \(\zeta_k\), enabling conditional inference and multi-fidelity fusion.
Each modal coefficient \(\zeta_k\) is a scalar random variable induced by uncertainty in the model parameters \(\theta\). We treat \(\zeta_k\) as a deterministic function of \(\theta\), thereby preserving its dependence on the original physical sources of uncertainty. In other words, instead of modeling \(\zeta_k\) as an unstructured random output, we seek to represent it explicitly as a function of \(\theta\). One standard approach is to express each \(\zeta_k\) using a truncated PCE (see, e.g., [7]–[11]): \[\begin{align} \zeta_k(\theta) = \sum_{\beta \in \mathcal{I}} b_{k,\beta} \Psi_{\beta}(\xi_1(\theta), \ldots, \xi_{n_s}(\theta)),\label{e:zeta95PCE} \end{align}\tag{10}\] where \(b_{k,\beta}\) are the expansion coefficients, \(\beta=(\beta_1,\ldots,\beta_{n_s}),\,\forall \beta_j\in\mathbb{N}_0\), is a multi-index, \(\mathcal{I}\) is a finite index set, \(n_s\) is the number of stochastic input dimensions, \(\xi_i\) are reference random variables, and \(\Psi_{\beta}(\xi_1, \ldots, \xi_{n_s})\) are multivariate orthonormal basis polynomials. Each multivariate basis function \(\Psi_{\beta}\) is expressed as a product of univariate polynomials: \[\begin{align} \Psi_{\beta}(\xi_1(\theta), \ldots, \xi_{n_s}(\theta)) = \prod_{i=1}^{n_s} \psi_{\beta_i}(\xi_i(\theta)), \end{align}\] where \(\psi_{\beta_i}\) denotes the degree-\(\beta_i\) univariate orthonormal polynomial corresponding to the distribution of \(\xi_i\).
In our implementation, we adopt several common design choices.
We set \(n_s\) equal to the number of uncertain input parameters \(\theta\) and associate each \(\xi_i\) with the \(i\)th \(\theta\) dimension. This provides a natural indexing and simplifies interpretation [11].
For our applications, where each uncertain input \(\theta_i\) follows a uniform distribution, we use Legendre polynomials with \(\xi_i\sim\mathcal{U}(-1,1)\). The transformation from each physical parameter \(\theta_i\) to the reference variable \(\xi_i\) is then a simple affine scaling. Other options of \(\xi_i\) and \(\psi_{\beta_i}\) are also possible [24]; for example, Gaussian-distributed inputs naturally pair with Hermite polynomials.
We adopt a total-order expansions of polynomial degree \(p\), yielding the index set \(\mathcal{I}=\{\beta : ||\beta||_{1} \leq p \}\), with total number of terms \(n_t = (n_s+p)!\big/(n_s!p!)\). The convergence rate depends on the regularity of \(\zeta_k(\theta\)); smoother functions admit more rapid decay in expansion coefficients and hence require fewer terms.
For simplicity, we use the same basis for all \(\zeta_k\), though it is possible to vary the basis or polynomial degree across modes.
Given the \(N\) realizations of each \(\zeta_k\), we solve for the coefficients \(b_{k,\beta}\) using non-intrusive regression: \[\begin{align} \underbrace{\begin{bmatrix} \Psi_{\beta^{(1)}}(\xi(\theta^{(1)})) & \cdots & \Psi_{\beta^{(n_t)}}(\xi(\theta^{(1))}) \\ \vdots & & \vdots \\ \Psi_{\beta^{(1)}}(\xi(\theta^{(N)})) & \cdots & \Psi_{\beta^{(n_t)}}(\xi(\theta^{(N)})) \end{bmatrix}}_{A_k} \underbrace{\begin{bmatrix} b_{k,\beta^{(1)}} \\ \vdots \\ b_{k,\beta^{(n_t)}} \end{bmatrix}}_{b_k} = \underbrace{\begin{bmatrix} \zeta_k^{(1)} \\ \vdots \\ \zeta_k^{(N)} \end{bmatrix}}_{c_k},\label{e:sparse95regression} \end{align}\tag{11}\] where \(\beta^{(j)}\) denotes the \(j\)th basis term following a particular ordering in the set \(\mathcal{I}\), \(A_k \in \mathbb{R}^{N\times n_t}\) is the feature matrix, \(b_k \in \mathbb{R}^{n_t}\) is the vector of unknown coefficients, and \(c_k \in\mathbb{R}^{N}\) contains the known values \(\zeta_k^{(n)}\).
Since the number of available simulations \(N\) can often be small relative to the number of basis terms \(n_t\), regularization is useful both to prevent overfitting and to promote sparsity in the coefficient vector \(b_k\). We utilize Tikhonov regularization (\(\ell_2\)-regularization): \[\begin{align} b_k^{\ast} = \arg \min_{b_k} \|A_k b_k - c_k\|_2^2 + \tau \|\Gamma b_k\|^2_2, \label{e:Tikhonov} \end{align}\tag{12}\] where \(\tau \geq 0\) is a regularization parameter and \(\Gamma\) is chosen to be the first-order differencing weight matrix. The parameter \(\tau\) typically can be selected via cross-validation [25].
This procedure enables fast evaluation of \(\zeta_k(\theta)\) for new inputs \(\theta\), thereby allowing efficient reconstruction of \(y_0(x, \theta)\) via the truncated KLE.
In this section, we first describe the construction of the additive bifidelity KLE surrogate with the incorporation of PCEs, followed by the active learning procedure used to improve the surrogate model under budget constraints. These steps are then summarized in an overall algorithm for BF-KLE-AL.
Suppose we have access to both a HF model and a LF model, each taking in the same input variables and predicting the same QoIs. Then we can write: \[\begin{align} y_{\mathrm{HF}}({x}, \theta) = y_{\mathrm{LF}}({x}, \theta) + \lbrack y_{\mathrm{HF}}({x}, \theta)-y_{\mathrm{LF}}({x}, \theta)\rbrack . \end{align}\] We approximate the right-hand side using surrogates: \[\begin{align} {y}_{\mathrm{HF}}({x}, \theta) \approx \underbrace{\tilde{y}_{\mathrm{LF}}({x}, \theta)}_{\text{LF-surrogate}} + \underbrace{\tilde{y}_{\mathrm{\Delta}}({x}, \theta)}_{\text{Discrepancy surrogate}}, \end{align}\] where \(\tilde{y}_{\Delta}({x}, \theta)\) approximates the discrepancy between the HF and LF models. This approach allows us to leverage the efficiency of the LF model while correcting for its bias using a small number of HF evaluations, yielding a bifidelity approximation of the HF model at reduced cost. Expanding both terms into their respective mean and zero-mean components: \[\begin{align} {y}_{\mathrm{HF}}({x}, \theta) \approx \lbrack \tilde{\mu}_{\mathrm{LF}}({x}) + \tilde{y}_{0,\mathrm{LF}}({x}, \theta)\rbrack + \lbrack \tilde{\mu}_{\mathrm{\Delta}}({x}) + \tilde{y}_{0,\Delta}({x}, \theta)\rbrack, \end{align}\] we define the right-hand side to be the bifidelity surrogate model: \[\begin{align} \tilde{y}_{\mathrm{BF}}({x}, \theta) = \tilde{\mu}_{\mathrm{BF}}({x})+ \tilde{y}_{0,\mathrm{BF}}({x}, \theta) \label{e:MF95KLE95sum} \end{align}\tag{13}\] where \(\tilde{\mu}_{\mathrm{BF}}({x}) = \lbrack \tilde{\mu}_{\mathrm{LF}}({x}) + \tilde{\mu}_{\mathrm{\Delta}}({x})\rbrack\) and \(\tilde{y}_{0,\mathrm{BF}}({x}, \theta) = \lbrack \tilde{y}_{0,\mathrm{LF}}({x}, \theta) + \tilde{y}_{0,\Delta}({x}, \theta)\rbrack\).
To construct \(\tilde{y}_{0,\Delta}({x}, \theta)\), we require paired HF and LF simulations evaluations at the same \(\theta\) values, and ideally on a common spatial grid. If the HF and LF solvers use different meshes, interpolation or restriction operations may be needed to align them. If \(N_1\) LF runs are used for construct \(\tilde{y}_{0,\mathrm{LF}}\), and \(N_{\Delta}\) HF-LF paired runs are used to construct \(\tilde{y}_{0,\Delta}\), then the total computational cost is \((N_1+N_{\Delta})\) LF evaluations and \(N_{\Delta}\) HF evaluations. In practice, we aim for \(N_{\Delta} \ll N_1\), relying on the LF model to capture most of the response.
Substituting the truncated KLE and the PCE representations for the zero-mean components gives: \[\begin{align} \tilde{y}_{0,\mathrm{BF}}({x}, \theta) &=\tilde{y}_{0,\mathrm{LF}}({x}, \theta) + \tilde{y}_{0,\Delta}({x}, \theta) \nonumber\\ &=\sum_{k=1}^{k_{t}}\sqrt{\lambda_k}q_k({x}) \left \lbrack \sum_{j=1}^{n_{t}}b_{k,{\beta^{(j)}}}\Psi_{\beta^{(j)}}(\xi(\theta))\right \rbrack + \sum_{k'=1}^{k'_{t}}\sqrt{\lambda'_{k'}}q'_{k'}({x}) \left \lbrack \sum_{j=1}^{n_{t}}b_{k',{\beta^{(j)}}}\Psi_{\beta^{(j)}}(\xi(\theta))\right \rbrack \nonumber\\ &= \sum_{j=1}^{n_{t}} \left \lbrack\sum_{k=1}^{k_{t}}\sqrt{\lambda_k}q_k({x})b_{k,{\beta^{(j)}}} + \sum_{k'=1}^{k'_{t}}\sqrt{\lambda'_{k'}}q'_{k'}({x})b_{k',{\beta^{(j)}}} \right \rbrack\Psi_{\beta^{(j)}}(\xi(\theta)).\label{e:MF95KLE} \end{align}\tag{14}\] Here, we have substituted the PCEs for each modal coefficient \(\zeta_k\) and \(\zeta_{k'}\), interchanged the order of summations, and combined the two components under the assumption that the same PCE basis is used for both LF and discrepancy terms. Finally, we obtain the full bifidelity prediction \(\tilde{y}_{\mathrm{BF}}\) by adding \(\tilde{\mu}_{\mathrm{BF}}\) back to \(\tilde{y}_{0,\mathrm{BF}}\) via 13 .
We make a few observations about 14 . First, although we assume a common set of PCE basis functions for all \(\zeta_k\) and \(\zeta_{k'}\), this assumption can be relaxed. In more general settings, different polynomial families and degrees can be used for different components, and the resulting expansions can still be combined by taking the union of all basis terms. Second, the eigenvalues and eigenvectors associated with the LF and discrepancy terms are distinct—that is, \((\lambda_k, q_k)\) from \(\tilde{y}_{0,\mathrm{LF}}\) are not the same as \((\lambda'_{k'}, q'_{k'})\) from \(\tilde{y}_{0,\Delta}\). As a result, the two KLEs cannot be merged into a single orthogonal decomposition, and the expression in 14 should not be interpreted as a KLE of the HF model. Rather, it represents a bifidelity surrogate: a global PCE of the bifidelity field, with spatially varying coefficients constructed from two independently built surrogates. In our numerical experiments, we compare this bifidelity model against a KLE built solely from HF data—and, for simpler synthetic cases, against true HF simulations—evaluating both its accuracy and computational efficiency.
The above derivation assumes that both models share the same input parameterization. When this assumption does not hold, alternative strategies are needed. One possible approach is to map the distinct parameter spaces to a common latent space. For example, PCEs can be constructed not only for outputs but also for inputs, allowing samples drawn in the latent space to be mapped to both models’ inputs and outputs. Multifidelity methods that address dissimilar parameterizations can be found in [26]–[28].
Having established the bifidelity KLE framework, we now focus on actively selecting new HF samples to improve the surrogate model. This is achieved through an active learning strategy [14] that iteratively refines the surrogate in regions where its predictions are expected to be least accurate. To quantify and guide improvement, we employ cross-validation to estimate the surrogate’s generalization error.
At the initial pilot stage, the training dataset consists of \(N_{\mathrm{LF}}^P\) LF samples used to construct the LF term, and a smaller subset of \(N_{\Delta}^P\) paired HF-LF samples used to model the discrepancy term. The initial sample design for both levels can be generated using space-filling strategies (e.g., Latin Hypercube or maximin designs) or pseudo-/quasi-random sequences.
We adopt a \(k\)-fold cross-validation procedure to estimate the local surrogate error. The data are partitioned into \(k\) folds; in each iteration, one fold serves as the validation set, while the remaining \((k-1)\) folds are used to train the bifidelity surrogate. For each sample \(\theta_i\), we compute the relative prediction error between the HF output and the surrogate prediction constructed when that sample is part of the held-out fold. Since the prediction QoIs are field-valued, this comparison yields an error field over the spatial or temporal domain, which is then summarized into a scalar error metric: \[\begin{align} \varepsilon^{(i)} = \frac{||y_{\mathrm{HF}}({x}, \theta_i) -\tilde{y}_{\mathrm{BF}}^{-\mathtt{s}(i)}({x}, \theta_i)||}{||y_{\mathrm{HF}}({x}, \theta_i)||}, \label{eq:kfold95err} \end{align}\tag{15}\] where \(\mathtt{s}(i)\) denotes the fold index containing sample \(\theta_i\), and \(\tilde{y}_{\text{BF}}^{-\mathtt{s}(i)}\) is the surrogate trained without that fold’s data.
The key idea is use these cross-validation errors as proxies for the surrogate’s generalization performance. Regions exhibiting large errors are prioritized for additional HF sampling to improve model accuracy. However, since cross-validation errors are only available at evaluated points, we model them using a GP [15], [16] to infer a smooth approximation of the generalization error across the parameter space.
We define a GP prior over the scalar-valued cross-validation error as \[\begin{align} {\varepsilon} \sim \mathcal{G P}( {m_0}(\cdot), \mathtt{k}(\cdot, \cdot)), \label{e:32gp1} \end{align}\tag{16}\] where \(m_0(\cdot)=0\) is the mean function and \(\mathtt{k}(\cdot,\cdot)\) is the covariance kernel. The kernel choice controls the GP’s expressivity. Here we employ a stationary Matèrn kernel with \(\nu=5/2\) for moderate smoothness; its other hyperparameters (length scale and variance) are optimized through maximum likelihood estimation. Other kernels may also be used.
Given \(M\) cross-validation errors computed at stage \(\ell\) of active learning, \(\varepsilon_{\ell} = [\varepsilon^{(1)}_\ell, \ldots, \varepsilon^{(M)}_\ell ]^\top\), with associated parameter samples \(\Theta_{\ell}=[\theta_{\ell}^{(1)}, \cdots, \theta_{\ell}^{(M)}]^\top\) the GP posterior mean and variance at a new point \(\theta^\ast\) are given by: \[\begin{align} m_{{\varepsilon}}(\theta^\ast) &= \mathtt{k}^{\top}\left(\Theta_{\ell},\theta^\ast\right)K\left(\Theta_{\ell}, \Theta_{\ell}\right)^{-1} \varepsilon_\ell, \tag{17} \\ \sigma_{{\varepsilon}}^{2}(\theta^\ast)&= \mathtt{k}(\theta^\ast, \theta^\ast)-\mathtt{k}^{\top}\left(\Theta_{\ell},\theta^{\ast}\right)K\left(\Theta_{\ell}, \Theta_{\ell}\right)^{-1}\mathtt{k}\left(\Theta_{\ell},\theta^\ast\right). \tag{18} \end{align}\] To determine the next HF sample, we maximize the expected improvement (EI) acquisition function [17], [18], which targets locations most likely to yield the largest GP value (generalization error in our case). For a GP, EI of a single point is defined as: \[\begin{align} \textrm{EI}(\theta) = \mathbb{E}[\max(\varepsilon - \varepsilon^{\ast}, 0)\; | \;\varepsilon \sim \mathcal{N}(m_{\varepsilon}(\theta), \sigma_{\varepsilon}^2(\theta)], \label{eq:EI95defn} \end{align}\tag{19}\] where \(\varepsilon^{\ast}\) denotes the largest observed value thus far. The EI admits a closed-form expression for its maximizer: \[\begin{align} \theta_{\textrm{new}} &= \operatornamewithlimits{arg\,max}_{\theta} \textrm{EI}(\theta) = \operatornamewithlimits{arg\,max}_{\theta} \,\left(m_{{\varepsilon}}(\theta)-\varepsilon^{\ast}\right) {\Phi}\left(\frac{m_{{\varepsilon}}(\theta)-\varepsilon^{\ast}}{\sigma_{{\varepsilon}}(\theta)}\right)+\sigma_{{\varepsilon}}(\theta){\mathit{\phi}}\left(\frac{m_{{\varepsilon}}(\theta)-\varepsilon^{\ast}}{\sigma_{\varepsilon}(\theta)}\right),\label{eq:al95ei95analytic} \end{align}\tag{20}\] where \(\Phi(\cdot)\) and \(\mathit{\phi}(\cdot)\) are the standard normal cumulative distribution and probability density functions, respectively.
Each time a new HF sample is selected, the LF model is also evaluated at the same input. These paired evaluations are immediately incorporated into the training set, and both components of the bifidelity surrogate are retrained. Cross-validation errors are then recomputed across all available points, and a new GP is fitted to the updated error estimates. Because surrogates are rebuilt at each active learning stage, the cross-validation errors—and thus the underlying error landscape—change even at previously evaluated points. Consequently, the GP must be refitted from scratch rather than incrementally updated. This iterative procedure continues until a stopping criterion is met, such as reaching a target error tolerance or exhausting the HF computational budget.
A limitation of the current formulation is that the sampling budget accounts only for the cost of HF simulations, neglecting the (typically smaller) cost of corresponding LF evaluations. Future extensions will explore cost-aware acquisition strategies that explicitly balance performance gain against the relative costs of LF and HF evaluations, potentially enabling adaptive fidelity selection during learning.
A further practical consideration concerns the number of HF samples acquired per stage. Sequential (single-point) acquisition may be inefficient in high-dimensional problems or when parallel computational resources are available. In such cases, batch acquisition offers a more scalable alternative. While analytical formulations of batch EI (\(q\)-EI) exist for small batch sizes [29], the optimization rapidly becomes intractable as batch size grows. Consequently, several approximate methods have been proposed [29]–[32]. In this work, we adopt the “Kriging Believer” heuristic [29] for batch selection. Once the first acquisition point is chosen, its predicted mean value is treated as a pseudo-observation, and the GP is refitted iteratively until a batch of \(q\) points is identified. This strategy enables efficient parallel HF sampling while preserving the exploratory behavior of single-point acquisition.
The complete procedure for constructing the bifidelity KLE surrogate with active learning-based selection of new HF evaluations is summarized in Algorithm 1. The bifidelity surrogate is initially built from pilot datasets (line [l:32initialize32bifidelity32KLE]) and the relative \(k\)-fold cross-validation errors are computed. Lines [l:32begin95while]–[l:32update95al95stage] implement the AL procedure for a new batch of \(q\) points until the total computational budget \(B\) is exhausted: in particular, GP regression and EI maximization (lines [l:32init95gp95prior][l:32ei95maximization]), surrogate model reconstruction (line [l:32rebuild95BF]), and cross-validation error update (line [l:32update95k95fold]).
In this section, we evaluate our bifidelity KLE surrogate framework and active learning strategy across three test problems of increasing complexity:
Example 1: 1D pulse function. A simple test case with a known analytical form of the HF model, enabling direct comparison against ground truth.
Example 2: 2D convection-diffusion. A parametric PDE problem solved using different discretizations, with uncertainty in source strength and location.
Example 3: 3D turbulent jet flow. QoIs extracted from computational fluid dynamics (CFD) simulations of flow around a turbulent round jet.
We begin with a benchmark example where the HF model is defined analytically as a product of decaying exponential and a sine function: \[\begin{align} y_{\textrm{HF}}({x}, \theta) = \exp(-a{x}) \sin(b{x}), \label{e:yhf95eqn} \end{align}\tag{21}\] where \(\theta=[a,b]\) are uncertain input parameters. We adopt \(a\sim\mathcal{U}(40,60)\) in all cases, and define two different LF models based on distinct approximations of the sine term.
Case 1 (C1): The sine function is approximated using a truncated Taylor series expansion: \[\begin{align} y_{\textrm{LF},1}(x, \theta) = \exp (-ax) \sin \left( bx - \frac{b^3 x^3}{3!} + \frac{b^5 x^5}{5!}\right), \label{e:ylf95case195eqn} \end{align}\tag{22}\] with \(b\sim\mathcal{U}(60,80)\) for both the HF and LF models. This approximation closely matches the true sine behavior for small values of \(x\), but diverges significantly as \(x\) increases.
Case 2 (C2): To induce smoother and more consistent correlation with the HF model, we use a modified empirical approximation inspired by [33]: \[\begin{align} y_{\textrm{LF},2}(x,\theta) = \exp(-ax)\left[\frac{3.5 bx\frac{180}{\pi}\left(180 - bx\frac{180}{\pi}\right)}{15000 - bx\frac{180}{\pi}\left(180 - bx\frac{180}{\pi}\right)}\right], \label{e:ylf95case295eqn} \end{align}\tag{23}\] with \(b\sim \mathcal{U}(30,50)\) for both the HF and LF models.
Figure 2 shows representative realizations of the HF model alongside both LF models for selected values of \(a\) and \(b\). As expected, the C1 LF model yields good agreement with HF for small \(x\), with visible divergence at larger values. In contrast, the C2 LF model exhibits greater absolute error.
Figure 3 presents the correlations between each LF model and the HF reference. The left panel illustrates that the C1 LF model maintains near-perfect positive correlation for small \(x\), followed by a sharp decline around the midpoint and partial recovery near \(x=0.1\). On the other hand, the C2 LF model (right panel) shows stable correlation close to 1 throughout the domain.
We construct the initial bifidelity KLE surrogate using \(N_{\mathrm{LF}}^P=200\) pilot LF simulations generated via a Latin Hypercube space-filling design. From these, a subset of \(N_{\Delta}^P=5\) paired HF-LF samples, also selected to ensure good space-filling properties, are used as the pilot set for modeling the discrepancy field. The design sets \(\Theta_{\mathrm{LF}}^P\) and \(\Theta_{\Delta}^P\) in the non-dimensional space are shown in Figure 4.
Each KLE is truncated to retain 99% of the total variance, corresponding to approximately 1–3 modes, and the associated PCE employs a total-order expansion of degree 3. The computational budget for active learning is set to \(B=65\) total HF evaluations. The GP model for generalization error is constructed using \(k=5\)-fold cross-validation, and the EI optimization is carried out in Python using routines from the BoTorch Bayesian optimization library [34]. To account for stochasticity in \(k\)-fold partitioning and EI optimization, all experiments are repeated over 20 independent replicates. For comparison, in each replicate, an additional bifidelity KLE is constructed using the same total number of HF points selected via pseudo-random sampling—we call this BF-KLE-RS.
To assess the effectiveness of the proposed BF-KLE-AL strategy, we evaluate the surrogate’s predictive accuracy against ground-truth HF solutions \(y_{\mathrm{HF}}\) at multiple acquisition stages. Specifically, we compute an integrated relative error for each acquisition stage \(\ell\): \[\begin{align} \mu_{\varepsilon,\ell} = \int \frac{||y_{\mathrm{HF}}({x}, \theta) -\tilde{y}_{\mathrm{BF}, \ell}({x}, \theta)||}{||y_{\mathrm{HF}}({x}, \theta)||}\, p(\theta) \, \mathrm{d}\theta, \label{eq:oracle95err95defn} \end{align}\tag{24}\] which we estimate by discretizing the integral using a \(200 \times 200\) uniform grid over \(\theta\).
Figure 5 summarizes the average results over all 20 replicates. All methods share identical hyperparameter settings for KLE truncation, PCE mapping, and active learning parameters. In addition to BF-KLE-AL and random sampling baseline, we also test an intentionally anti-informative sampling policy that prioritizes low-error regions (i.e., minimizes EI). This provides a lower-bound reference (and sanity check) for the effect of poor sampling decisions.
The results reveal several clear trends. First, selecting points in regions of low estimated error predictably leads to the poorest performance, even worse than random sampling. Second, while surrogate accuracy generally improves as more HF points are added, the relative benefit of active learning depends strongly on the LF-HF correlation structure. In C1, where correlation varies sharply across the domain, active learning and random sampling perform comparably up to a point, with little to no advantage for EI-based myopic selection. After approximately 45 HF points, random sampling has better accuracy gains. In contrast, for C2 where the LF and HF response fields exhibit stronger and more spatially uniform correlation, active learning yields more noticeable accuracy gains (though still modest) once approximately 30 HF samples have been acquired. Figure 6 illustrates the batchwise acquisition locations for C2, aggregated over five replicates. The relatively small performance gap between BF-KLE-AL and BF-KLE-RS in this example may stem from the fact that the active learning heuristic, despite targeting regions of higher estimated error (e.g., with some clustering at the corners), still produces a broadly dispersed sampling pattern. As a result, the benefit over random sampling is diminished, especially given the already strong LF signal.
To further examine how new acquisitions influence model performance, Figure 7 presents the evolution of the relative errors for a representative replicate for both BF-KLE-AL and BF-KLE-RS. Two types of error trends are analyzed:
errors at the training points used to construct the bifidelity surrogate, both before and after each point is incorporated into the training set; and
errors on a separate test set derived from the competing strategy (e.g., BF-KLE-AL samples held out for BF-KLE-AL surrogate and vice versa), providing a complementary generalization check.
Each subfigure in Figure 7 can be interpreted as follows. Each pixel row corresponds to a specific data point, and each column represents a data acquisition stage (either through active learning or random sampling). Pixels below the black line denote points that are currently serving as test samples, while pixels above correspond to points included in the training set. As the algorithm proceeds from left to right, the black line shifts downward, indicating that more points have been incorporated into the training set. Points below the red line represent a separate test set from the competing strategy (per item 2 above) that is not used for training that particular surrogate model.
The histogram insets at the final stage summarize the distribution of relative errors across these test sets. Consistent with the observations in Figure 5, the average relative error on the random-sampling test set is lower for the BF-KLE-AL surrogate in C2, indicating improved generalization from active learning. In contrast, the BF-KLE-RS surrogate exhibits noticeably higher test errors. For C1, both strategies yield comparable performance, though the BF-KLE-RS surrogate performs marginally worse for this specific replication.
We now compare forward UQ results obtained using the bifidelity KLE surrogates. Figure 8 presents the comparison between reference UQ results for the HF model: the mean and \(\pm1\) standard deviation bounds propagated directly from the true HF model \(y_{\mathrm{HF}}\), and the same predictive statistics from the bifidelity and LF-KLE surrogates. The LF surrogate is constructed solely from LF data obtained in the bifidelity case.
Several observations can be made. First, due to the bias in the pilot LF snapshots, the LF-KLE surrogate fails to accurately reproduce the mean and variability of the HF model. In C1, the divergence of the LF approximation with increasing \(x\) leads to inflated uncertainty in the LF-KLE predictions, a trend that also persists in the surrogate constructed via random sampling. In contrast, the BF-KLE-AL surrogate more accurately captures the mean response and successfully corrects for the systematic bias present in the LF approximations. Notably, BF-KLE-AL somewhat underpredicts the uncertainty in each case as seen from the inset plots. On the other hand, BF-KLE-RS uncertainty estimates are generally inflated, particularly in regions of high LF approximation bias.
Next, we demonstrate the proposed method on a more complex problem: a two-dimensional scalar convection-diffusion equation. Let \(\phi(\mathbf{x}, t)\) denote a scalar field defined on the spatial domain \(\mathbf{x}=(x, y)\in[0,1]^2\). The governing PDE is: \[\begin{align} \frac{\partial\phi(\mathbf{x}, t)}{\partial t}+\nabla\cdot(\mathbf{u}\phi(\mathbf{x}, t))=\nabla\cdot(\alpha\nabla\phi(\mathbf{x}, t))+\dot{\omega}_\phi, \qquad t > 0, \label{eq:pde952d} \end{align}\tag{25}\] where \(\mathbf{u}=(u,v)\) is the spatially varying advection velocity, \(\alpha=0.01\) is the constant diffusion coefficient, and \(\dot{\omega}_\phi(x, y)\) is a source (reaction) term. The initial condition is \(\phi(\mathbf{x}, t=0) = 0\) and boundary conditions are periodic.
The components of the advection velocity are defined as: \[\begin{align} u(x,y)&=\frac{1}{10}-\sin{(\pi x)^2}\Big[\sin{(\pi(y-0.05))}\cos{(\pi(y-0.05))} \nonumber\\ & -\sin{(\pi(y+0.05))}\cos{(\pi(y+0.05))}\Big],\tag{26} \\ v(x,y)&=\sin{(\pi x)}\cos{(\pi x)}\left[\sin{(\pi(y-0.05))}^2-\sin{(\pi(y+0.05))}^2\right].\tag{27} \end{align}\] The source term is: \[\begin{align} \dot{\omega}_\phi(x,y)&= \frac{\theta_s}{2\pi \theta_h^2} \Bigg[\exp\left(-\frac{(x-\theta_x)^2+(y-\theta_y)^2}{2\theta_h^2}\right)\nonumber\\ & -\exp\left(-\frac{(x-\theta_x + 0.05)^2+(y-\theta_y + 0.05)^2}{2 \theta_h^2}\right)\Bigg], \label{eq:source95fn} \end{align}\tag{28}\] where \(\theta = [\theta_s, \theta_h, \theta_x, \theta_y]\) comprises four uncertain parameters representing the source strength, width, and spatial coordinates. The independent distributions are: \(\theta_s \sim \mathcal{U}(0.01, 0.05)\), \(\theta_h \sim \mathcal{U}(0.05, 0.08)\), \(\theta_x \sim \mathcal{U}(0.3, 0.7)\), and \(\theta_y \sim \mathcal{U}(0.55, 0.85)\).
Two model fidelities are defined using different grid resolutions, HF: \(128 \times 128\) uniform grid, and LF: \(32 \times 32\) uniform grid. The HF field solutions are restricted to the LF grid for constructing the bifidelity surrogate. The PDE is integrated in time up to \(t=2.5\) using first-order explicit time stepping. Convection terms are discretized using a first-order upwind scheme, and diffusion terms using second-order central differencing. The Courant–Friedrichs–Lewy (CFL) condition determines the time step for each fidelity, resulting in \(\Delta t_{\mathrm{LF}}=0.02\) and \(\Delta t_{\mathrm{HF}}=0.0012\) for \(\mathrm{CFL}=0.8\). The QoI is defined as \(y = \phi(\mathbf{x}, t=2.5)\). Representative LF and HF realizations (Figure 9) show strong correlation across the spatial domain.
Pilot designs of size \((N_{\mathrm{LF}}^P, N_{\Delta}^P) = (300, 10)\) are generated using the same space-filling design procedure as in Example 1. Each KLE is truncated to retain 99% of the total variance, corresponding to approximately 4–12 modes, and the associated PCE employs a total-order expansion of degree 3.
The computational budget for active learning is set to \(B=50\) total HF evaluations. The GP model for the generalization error is constructed using 10-fold cross-validation. All experiments are repeated over 20 independent replicates. Relative errors are computed using a 1000-sample Monte Carlo estimation of the integral in 24 .
Figure 10 presents the evolution of the relative error as a function of data acquisition stage. Although the overall reduction in error is moderate (primarily because the discrepancy term contributes less to the total surrogate variance in this problem) the BF-KLE-AL surrogate consistently outperforms BF-KLE-RS on average.
Figure 11 illustrates the detailed progression of relative errors for a representative replicate, following the same visualization format as Figure 7. For this 2D problem, the performance difference between the two strategies is evident across all stages: surrogates trained via active learning exhibit markedly lower relative errors on the random sampling test set compared with those trained by random selection.
In this example, we adopt a 3D turbulent jet flow from our earlier study [35] focusing on the effects of parametric uncertainty on round jet behavior. The system involves an isothermal, axisymmetric jet exhausting into quiescent air at a jet Mach number \(M_j = 0.9\). In the earlier study, uncertainty was introduced through the stagnation pressure \(p_0\) and the modified eddy viscosity ratio (MEVR) \(\tilde{\nu}/\nu\), while a uniform velocity \(u_e\) was imposed at the nozzle exit. In this work, we adopt a more physically realistic velocity profile and expand the set of uncertain parameters.
Specifically, we model the nozzle exit velocity using a top-hat profile that peaks at the centerline and decays toward the nozzle wall: \[\begin{align} u_e(r) = U_c \tanh \left(\frac{D / 2-r}{\kappa D}\right), \label{e:32topHat} \end{align}\tag{29}\] where \(u_e(r)\) is the axisymmetric velocity profile at radial location \(r\), \(U_c\) is the centerline velocity, \(D\) is the nozzle diameter, and \(\kappa D\) represents the momentum thickness as a fraction \(\kappa\) of the diameter.
We treat the following three input parameters as uncertain: \(\theta = [U_c, \kappa, \log(\tilde{\nu}/\nu)]\), where \(U_c\sim \mathcal{U}(293.24 , 312.94)\) m/s, selected to keep the flow subsonic near \(M_j= 0.9\); \(\kappa\sim \mathcal{U}(0.1,0.3)\); and \(\log(\tilde{\nu}/\nu)\in \mathcal{U}(1.53,4.6)\). These parameters shape the velocity profile \(u_e(r)\), which in turn influences the local Mach number, stagnation conditions, and downstream jet evolution.
All simulations are performed using the SU2 open-source multi-physics solver [36]. HF simulations are conducted using the enhanced delayed detached-eddy simulation (EDDES) approach, which resolves large-scale turbulent structures. LF simulations use the Reynolds-averaged Navier–Stokes (RANS) formulation. Both solvers employ the Spalart–Allmaras (SA) turbulence model for consistency. All runs are performed on the G1 mesh, which consists of approximately 1.6 million cells and is shared across partners of the EU Go4Hybrid (G4H) project consortium [37].
This HF-LF pairing introduces a notable computational cost difference: each EDDES simulation requires approximately 6.25 CPU hours, compared to about 1.5 hours for a corresponding RANS run—a roughly \(4\times\) increase. The added cost stems from the temporal resolution needed to capture unsteady turbulent dynamics in EDDES, in contrast to the steady-state nature of RANS simulations.
Figure 12 illustrates complex turbulent structures in a representative HF simulation using a 3D Q-criterion isosurface (colored by vorticity magnitude) and a meridional slice contouring vorticity magnitude. The Q-criterion is defined as \(Q = \frac{1}{2} \left( \lVert \Omega \rVert^2 - \lVert S \rVert^2 \right)\), where \(\Omega = \tfrac{1}{2}(\nabla \mathbf{u} - \nabla \mathbf{u}^{\top})\) and \(S = \tfrac{1}{2}(\nabla \mathbf{u} + \nabla \mathbf{u}^{\top})\) are the antisymmetric (rotation-rate or vorticity) and symmetric (strain-rate) components of the velocity-gradient tensor \(\nabla \mathbf{u}\), respectively, and \(\mathbf{u}\) denotes the velocity field.
From these 3D solutions, we extract three key QoI profiles along the nozzle lipline:
\(\bar{v}\): the mean axial velocity;
\(\overline{u'u'}\): the normal component of Reynolds stress; and
\(\overline{u'w'}\): the shear component of Reynolds stress.
These QoIs are 1D fields, plotted in Figure 13 for a pilot dataset comprising 15 HF (EDDES) and 200 LF (RANS) simulations. They are extracted from the 3D flow solutions using ParaView, an open source data analysis and visualization tool [38].
Pilot samples of size \((N_{\mathrm{LF}}^P=200, N_{\Delta}^P=15)\) are used to construct bifidelity KLE surrogates for each QoI. As in the previous examples, each KLE is truncated to retain 95% of the total variance, corresponding to approximately 1–2 modes for the LF term and 5–20 modes for the discrepancy term, and the associated PCE employs a total-order expansion of degree 3.
The computational budget for active learning is set to \(B = 50\) total HF evaluations for BF-KLE-AL. Due to computational constraints, the BF-KLE-RS baseline uses a smaller budget of \(B = 40\). To guide the active learning process, the GP model for the generalization error is constructed using leave-one-out (LOO) cross-validation, averaged over all three QoIs, in place of 15 : \[\begin{align} \varepsilon^{(i)} &= \frac{1}{3} \left(\frac{||\bar{v}_{\mathrm{HF}}({x}, \theta_i) -\tilde{\bar{v}}_{\mathrm{BF}}^{-(i)}({x}, \theta_i)||}{||\bar{v}_{\mathrm{HF}}({x}, \theta_i)||} + \frac{||\overline{u'u'}_{\mathrm{HF}}({x}, \theta_i) -\widetilde{\overline{u'u'}}^{-(i)}_{\mathrm{BF}}({x}, \theta_i)||}{||\overline{u'u'}_{\mathrm{HF}}({x}, \theta_i)||} \right. \nonumber\\ &\left. + \frac{||\overline{u'w'}_{\mathrm{HF}}({x}, \theta_i) -\widetilde{\overline{u'w'}}^{-(i)}_{\mathrm{BF}}({x}, \theta_i)||}{||\overline{u'w'}_{\mathrm{HF}}({x}, \theta_i)||}\right). \label{eq:loo95err95metric} \end{align}\tag{30}\] To leverage available computational resources and enable parallel HF simulations, we employ batch active learning with a batch size of \(q = 5\). This allows up to five HF evaluations to be executed simultaneously during each acquisition stage.
Figure 14 shows the pilot set and the additional HF-LF simulation pairs acquired during successive active learning batches, color-coded by acquisition stage. The accompanying plot tracks the history of QoI-averaged LOO cross-validation error. As the surrogate quality improves, the newly selected points exhibit progressively lower and less variable cross-validation error, indicating that the active learning procedure successfully focuses sampling in regions of high model discrepancy.
To assess the evolution of surrogate performance, we compare results of BF-KLE-AL against two reference surrogates:
BF-KLE-RS: built using the pilot set plus an additional 25 random HF-LF pairs.
HF-KLE: built from the 50 HF simulations from BF-KLE-AL.
Figure 15 presents the batch-wise comparison of the BF-KLE-AL and BF-KLE-RS surrogates on the test sets from the competing strategy—that is, BF-KLE-AL is tested on the 25 random sampling runs, while BF-KLE-RS is tested on the 40 active learning runs. The figure follows the same visualization format used in earlier examples. The BF-KLE-AL surrogate exhibits progressively improved accuracy across batches, as reflected in the decreasing relative errors over successive acquisitions. In contrast, the BF-KLE-RS surrogate displays inconsistent performance where relative errors occasionally increase as new samples are added, since random augmentation lacks a criterion for targeting informative regions. Consequently, the acquisition step index has no meaningful correlation with model improvement for the BF-KLE-RS case, unlike the monotonic trend observed under BF-KLE-AL.
In the bottom part of Figure 15, we further visualize the QoI predictions from the BF-KLE-AL surrogate for a representative run. Among the three QoIs, the average error driving the EI criterion is most strongly influenced by the Reynolds stress components, particularly \(\overline{u'u'}\). This dominance is reflected in the results: BF-KLE-AL rapidly converges to the true HF field for \(\overline{u'u'}\) and \(\overline{u'w'}\), while improvements in the mean axial velocity \(\bar{v}\) are comparatively smaller.
To further evaluate convergence behavior, Figure 16 depicts the evolution of the relative difference between the BF-KLE-AL surrogate and the HF-KLE reference for the normal Reynolds stress component \(\overline{u'u'}\), evaluated at 500 random points in the parameter space. As expected, the discrepancy steadily decreases as additional HF samples are incorporated, demonstrating that the bifidelity surrogate asymptotically approaches the accuracy of the HF-only model as \(N_{\mathrm{HF}} \to N_{\Delta}\).
Figure 17 compares the BF-KLE-AL and BF-KLE-RS surrogates, at \(N_{\textrm{HF}}=40\), using the HF-KLE predictions as a common reference. Although the absolute differences are computed against surrogate predictions rather than true HF data, the trends mirror those observed in smaller tests based on the true HF simulations. The active learning-trained surrogates yield consistently lower relative errors, particularly in the top-left and bottom-right regions of the domain, while random sampling produces larger spatially correlated discrepancies. The distribution of mean relative errors across all three QoIs confirms this finding: active learning leads to more informative HF sample selection and more accurate, sample-efficient surrogate construction.
Finally, we show the forward UQ results. Figure 18 presents the comparison between the predictive statistics—mean and \(\pm 1\) standard deviation bounds—from BF-KLE, LF-KLE and HF-KLE surrogates built via active learning and random sampling (the LF-KLE and HF-KLE surrogates are constructed solely from the respective single-fidelity simulations obtained for when applying each strategy to BF-KLE). As expected from the bias observed in the pilot LF snapshots, especially for \(\overline{u'u'}\), the LF-KLE surrogate does not accurately reproduce the mean and variability in the HF-KLE surrogate uncertainty estimates. On the other hand, the mean response and variability from BF-KLE-AL and BF-KLE-RS closely overlap with the respective statistics for HF-KLE-AL and HF-KLE-RS. The BF-KLE-AL uncertainty for \(\overline{v}\) is slightly inflated compared to the HF-KLE-AL result, while they coincide for the Reynolds stress QoIs that dominate the EI criterion.
In this work, we introduced a bifidelity KLE surrogate model for field-valued QoIs. The approach combines the expressive power of the KLE—commonly used to represent spatially or temporally correlated fields—with PCEs to maintain an explicit mapping between input uncertainties and output responses. By leveraging inexpensive LF simulations to capture dominant trends and a limited number of HF evaluations to correct systematic bias, the method achieves computationally efficient surrogate construction with strong predictive accuracy.
To further enhance performance, we formed an active learning strategy that adaptively selects new HF simulations based on the surrogate’s estimated generalization error. This error is quantified through cross-validation, modeled using GP regression, and used to drive an EI acquisition function that targets regions of poor surrogate performance. The resulting BF-KLE-AL method was demonstrated on a sequence of test problems of increasing complexity, from an analytical benchmark and a parametric convection-diffusion equation to a realistic turbulent jet flow simulation, showing consistent gains in accuracy and sample efficiency.
Several limitations and opportunities for future work remain. First, our present investigation focuses on the KLE surrogate structure, and it would be of interest to explore bifidelity behavior under alternative architectures such as neural networks or reduced order models. Second, the current framework has been demonstrated for low- to moderate-dimensional uncertainty spaces; extending it to higher dimensions will benefit from dimensionality-reduction techniques such as active subspaces, sensitivity-based screening, or sparse PCEs to identify and prioritize the most influential parameters.
Finally, the proposed active learning algorithm is myopic, selecting points greedily without accounting for the long-term impact of each acquisition. Developing non-myopic or multi-step lookahead acquisition strategies, potentially informed by Bayesian optimal experimental design principles, could further improve efficiency. Moreover, extending the framework to multi-fidelity hierarchies with more than two levels would enable adaptive selection of both the sampling location and the fidelity level, thereby balancing information gain against computational cost in a principled, cost-aware manner.
AJ and XH were partially supported by the National Science Foundation under Grant 2027555. CS was supported by the Scientific Discovery through Advanced Computing (SciDAC) program through the FASTMath Institute.
The computational resources provided by the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk) are gratefully acknowledged.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
This article is an expanded version of a paper [35] presented at AIAA SciTech on 10 January, 2021 virtually. The conference paper is available at https://doi.org/10.2514/6.2021-1367.
Address all correspondence to: Aniket Jivani, ajivani@umich.edu↩︎