October 02, 2025
All inverse problems rely on data to recover unknown parameters, yet not all data are equally informative. This raises the central question of data selection. A distinctive challenge in PDE-based inverse problems is their inherently infinite-dimensional nature: both the parameter space and the design space are infinite, which greatly complicates the selection process. Somewhat unexpectedly, randomized numerical linear algebra (RNLA), originally developed in very different contexts, has provided powerful tools for addressing this challenge. These methods are inherently probabilistic, with guarantees typically stating that information is preserved with probability at least \(1-p\) when using \(N\) randomly selected, weighted samples. Here, the notion of “information” can take different mathematical forms depending on the setting. In this review, we survey the problem of data selection in PDE-based inverse problems, emphasize its unique infinite-dimensional aspects, and highlight how RNLA strategies have been adapted and applied in this context.
Inverse problems based on partial differential equations (PDE) are relevant in such important and physically motivated applications as materials science, medical imaging, and geophysics. The fundamental task of the inverse problem is to reconstruct certain parameters (typically functions) in the PDEs that govern the dynamics of the system, using data gathered from the system. These parameters often encode critical information for downstream tasks. For example, recovering the optical properties of biological tissue may reveal abnormalities, recovering the refractive index of subsurface structures is essential in petroleum exploration, and recovering the band structure of a quantum system can help with material design.
PDE-based inverse problems have been studied extensively, from both analytical and algorithmic perspectives, across a wide range of PDE models (see, e.g., [1]–[3]). Most of this work assumes that data is given, and focuses on the quality and stability of reconstruction. In practice, reconstruction is often preceded by a critical step: experimental design. Experimental design refers to the process of planning experiments so as to maximize the information content of the data collected for a given task. In PDE-based inverse problems, we usually cannot alter the governing equations themselves, but we do have control over experimental settings such as the source profile (which triggers the PDE dynamics) or the placement of detectors. This perspective is closely related to data selection: given the possibility of generating a large collection of data, the goal is to identify the most informative subset of experiments that yields the best possible reconstruction of the unknown parameters.
Classical experimental design is a rich and well-developed area of statistics, offering tools such as leverage scores, Fisher information, and optimality criteria including A-, D-, and E-optimality, which have been successfully applied across many scientific domains (see [4], [5]). In recent years, many of these techniques have been brought into the setting of PDE-based inverse problems, particularly within the framework of optimal design (see the reviews [6], [7]). Yet, the direct transfer of experimental design principles to PDE-based problems is not straightforward; it introduces several distinctive challenges.
Structure induced by PDEs. Classical design methods are largely generic and independent of specific models, but PDE-constrained inverse problems possess rich structural features that impose constraints while also offering opportunities. Effective experimental design must respect and, if possible, exploit these structures.
Efficiency over optimality. Many design strategies seek an “optimal" subset of experiments. In PDE-based settings, efficiency often takes precedence: Given the high computational and experimental costs, the aim is to obtain informative data quickly rather than achieve exact optimality.
Cost of data acquisition. Traditional experimental design often assumes that all candidate data are already available for evaluation. By contrast, in PDE-based inverse problems, each data point may be expensive to generate, whether computationally or experimentally. This raises issues of sample complexity and calls for design strategies that account for acquisition costs.
These considerations indicate that classical methods for experimental design needed to be adapted in order to make them effective for PDE-based inverse problems.
The apparently unrelated research direction of randomized numerical linear algebra (RNLA) has developed rapidly in recent years [8], [9]. Originally motivated by large-scale data problems in machine learning and internet applications such as recommendation systems, RNLA methods emphasize efficiency, trading some accuracy for scalability through randomization. The RNLA philosophy resonates with the goals of experimental design for PDE inverse problems, where efficiency is a more pressing concern than exact optimality.
One of the classical contributions of RNLA to experimental design is the fast computation of leverage scores in linear regression [9], [10], which serves as a cornerstone for randomized data selection. Over the years, these techniques have been adapted and extended, and are now used for experimental design in PDE-based inverse problems.
This review summarizes these recent advances at the interface of data selection for PDE-based inverse problems and RNLA. The field is still developing: Many algorithms are being actively refined, and nonlinear extensions remain largely open. Our aim is to provide a coherent overview of what has been achieved so far, while highlighting key challenges and promising directions for future research.
The remainder of the paper is structured as follows. Section 2 gives background on PDE-based inverse problems, highlighting structural features that are particularly relevant for experimental design. While reconstruction solvers are typically viewed as downstream tasks, the way they are formulated determines how much information can be extracted from data and thus directly guides data selection. Section 3 reviews two classical approaches: PDE-constrained optimization and Bayesian formulations. Both solvers rely critically on linearization; we also provide technical preparation in this context. Section 4, the main pillar of this review, surveys recent advances in RNLA methods and their applications to experimental design for PDE inverse problems. Section 5 concludes with a discussion of nonlinear extensions and directions for future research.
Among inverse problems, those governed by PDEs form a distinct and important subclass, marked by unique structural and analytical challenges. While different PDEs give rise to diverse inversion properties, they also share fundamental traits that permit a unified treatment. A particularly notable feature of PDE-based inverse problems is their inherently infinite-dimensional nature, which contrasts with the finite-dimensional representations required for computation. It is not trivial to bridge this disparity. Theoretical analysis is typically carried out in infinite-dimensional function spaces, whereas numerical algorithms must ultimately operate in finite dimensions. This bridge lies at the core of the present review and directly motivates our focus on data selection and experimental design.
In the subsections that follow, we introduce a generic PDE-based inverse problem in its infinite-dimensional form and describe the hierarchy of steps leading to the finite-dimensional numerical treatment. We discuss issues of well-posedness and the implications for experimental design, and we distinguish between quantitative and qualitative design goals. Finally, we conclude with representative examples of PDE-based inverse problems, grounding the abstract framework in concrete scientific applications.
Every PDE-based inverse problem begins with an underlying partial differential equation of the form: \[\label{eqn:PDE} \mathcal{L}_{\pmb{\sigma}}[u] = S_{\theta_1}(x)\quad\Rightarrow\quad \mathrm{d} = {M}_{\theta_2}[u]\,.\tag{1}\] Here, \(\mathcal{L}_{\pmb{\sigma}}\) denotes the PDE operator that models the physical or biological phenomenon under investigation, defined over a Euclidean domain with spatial variable \(x \in \mathbb{R}^d\). The operator \(\mathcal{L}_{\pmb{\sigma}}\) depends on a parameter \(\pmb{\sigma}\), which encodes certain characteristics of the system. The input term \(S_{\theta_1}(x)\) may represent a source term, boundary condition, or initial condition. The system is fully specified by fixing \(\pmb{\sigma}\), and a given input \(S_{\theta_1}(x)\) leads to a solution \(u\) of the PDE.
The state \(u\) is typically not observable in full. Instead, measurements are obtained through an observation operator \(M_{\theta_2}\), producing ground-truth data \(\mathrm{d}(\theta_1,\theta_2) = {M}_{\theta_2}[u]\). The pair \(\theta = (\theta_1, \theta_2) \in \Theta\) specifies the experimental design configuration, with \(\theta_1\) and \(\theta_2\) respectively describing the design of source and detector. Varying these configurations provides information about the unknown parameter \(\pmb{\sigma}\), and the collection of all such configurations \(\Theta\) is referred to as the design space.
This entire process defines what we refer to as the Input-to-Output (ItO) operator: \[\label{eqn:Lambda} \Lambda_{\pmb{\sigma}}:\; (S_{\theta_1}, M_{\theta_2}) \mapsto \mathrm{d}(\theta_1, \theta_2)\,.\tag{2}\]
Throughout this paper, we assume for all admissible \(\pmb{\sigma}\in \Sigma\), the PDE 1 is well-posed under a prescribed metric, and the solution is measurable using the observation operator \({M}_{\theta_2}\) for all values of \(\theta_2\) under consideration. These assumptions guarantee that the ItO operator 2 is well-defined for each \(\pmb{\sigma}\). Moreover, we equip the domain and codomain of \(\Lambda_{\pmb{\sigma}}\) with metrics inherited from the input space \(S_{\theta_1}(x)\) and the data space \(\mathrm{d}(\theta_1,\cdot)\), respectively. Within this framework, the forward problem can be summarized as follows: for a fixed system parameter \(\pmb{\sigma}\), the operator \(\Lambda_{\pmb{\sigma}}\) maps each input \(S_{\theta_1}(x)\) to a corresponding data function \(\mathrm{d}(\theta_1,\cdot)\).
The inverse problem is to recover the parameter \(\pmb{\sigma}\) from knowledge of the operator \(\Lambda_{\pmb{\sigma}}\), or equivalently, from the full collection of data values \(\mathrm{d}(\theta_1,\theta_2)\) over \(\theta = (\theta_1,\theta_2) \in \Theta\). To formalize this, we define the data map: \[\label{eqn:data} \pmb{\mathcal{M}}(\theta; \pmb{\sigma}): \Theta \otimes \Sigma \to \mathbb{R},\;\quad \text{with} \quad \pmb{\mathcal{M}}(\theta, \pmb{\sigma}) = \Lambda_{\pmb{\sigma}}[S_{\theta_1}, M_{\theta_2}] = \mathrm{d}(\theta_1, \theta_2)\,.\tag{3}\] The task of the inverse problem is then to infer \(\pmb{\sigma}\) from complete knowledge of \(\pmb{\mathcal{M}}(\cdot\;;\pmb{\sigma})\).
A defining characteristic of this class of problems is their infinite-dimensional nature. The unknown parameter \(\pmb{\sigma}\) is typically a function and thus infinite-dimensional, while the data function \(\mathrm{d}\) is indexed continuously over the design space \(\Theta\). Consequently, both the parameter space \(\Sigma\) and the design space \(\Theta\) reside in infinite-dimensional settings, distinguishing these problems from standard finite-dimensional inverse problems.
While the natural formulation of PDE-based inverse problems is often infinite-dimensional, we need to reduce to finite dimensions for experimental and numerical reasons. This reduction arises from two primary sources:
Limited data: Only finitely many experiments can be carried out in practice, which restricts the design space to a finite subset \(\Theta_c \subset \Theta\). If \(c\) experiments are conducted, and each experiment requires a specific value for \(\theta=(\theta_1,\theta_2)\), then we are taking \(|\Theta_c| = c\).
Reduced parameterization: \(\pmb{\sigma}\) is typically represented with a finite number of degrees of freedom, yielding a finite-dimensional approximation \(\Sigma_n\) of dimension \(n\). Any \(\pmb{\sigma}\) living in this space \(\Sigma_n\) is then automatically a finite-dimensional object.
Under these approximations, the data map 3 becomes \[\label{eqn:data95finite} \pmb{\mathcal{M}}(\theta; \pmb{\sigma}):\; \Theta_c \otimes \Sigma_n \to \mathbb{R}\,.\tag{4}\]
It is tempting to view the finite-dimensional formulation as simply an approximation of the infinite-dimensional one. Indeed, as \(n,c\to\infty\), the two settings coincide. However, theoretical results established in the infinite-dimensional framework do not in general translate directly to the finite \(n,c\) setting. Bridging these regimes requires a hierarchy of intermediate analyses, which we summarize as follows.
Well-posedness at the PDE level: In the infinite-dimensional setting, well-posedness concerns uniqueness and stability of the inverse map. We seek an estimate of the form \[\label{eqn:well95posed} \|\pmb{\sigma}_1 - \pmb{\sigma}_2\| \leq w\!\left(\|\pmb{\mathcal{M}}(\cdot, \pmb{\sigma}_1) - \pmb{\mathcal{M}}(\cdot, \pmb{\sigma}_2)\|\right)\,, \quad \pmb{\sigma}\in \Sigma \cap B_R(\pmb{\sigma}_\ast)\,,\tag{5}\] where \(w\) quantifies the stability of the inverse map and \(B_R(\pmb{\sigma}_\ast)\) denotes the neighborhood of radius \(R\) around the ground truth \(\pmb{\sigma}_\ast\). Here \(R\) may be infinite, or may encode problem-specific constraints such as positivity. The following two properties of \(w\) capture the essence of well-posedness.
Uniqueness: \[\label{eqn:uniqueness} w(0) = 0\,,\tag{6}\] implying that complete knowledge of \(\pmb{\mathcal{M}}(\cdot,\pmb{\sigma})\) uniquely determines \(\pmb{\sigma}\). Uniqueness has been established for many PDE-based inverse problems, the most celebrated being the Calderón problem [11], with pioneering results in [12], [13] and subsequent extensions in [14]–[17].
Stability: We seek constants \(C > 0\) and \(\beta > 0\) such that \[\label{eqn:stability} w(t) \leq C t^\beta\,,\tag{7}\] ensuring that small perturbations in data lead to controlled (e.g., Hölder/Lipschitz continuous) perturbations in the reconstruction. It is also possible for the upper bound to take on a logarithmic or exponential form. Foundational results in this direction include inverse conductivity problems [18] and inverse scattering problems [19]–[21].
These results are highly problem-specific, relying on PDE-dependent techniques. For instance, complex geometrical optics (CGO) solutions are central in Schrödinger-type and elliptic problems [22], Carleman estimates play a key role in transport and hyperbolic equations [23]–[25], and singular decomposition or averaging-lemma techniques are employed for kinetic equations [26], [27].
Well-posedness in the finite-dimensional setting: After discretization, we ask analogous questions of uniqueness and stability with \(\Sigma\) replaced by \(\Sigma_n\) and \(\Theta\) by \(\Theta_c\). The relevant estimate becomes \[\label{eqn:well95posed95finite} \|\pmb{\sigma}_1 - \pmb{\sigma}_2\| \leq w_{n,c}\!\left(\|\pmb{\mathcal{M}}(\cdot|_{\Theta_c}, \pmb{\sigma}_1) - \pmb{\mathcal{M}}(\cdot|_{\Theta_c}, \pmb{\sigma}_2)\|\right)\,, \quad \sigma \in \Sigma_n \cap B_R(\pmb{\sigma}_\ast)\,.\tag{8}\] That is:
If \(w_{n,c}(0) = 0\), then \(\pmb{\mathcal{M}}(\cdot|_{\Theta_c}, \pmb{\sigma})\) uniquely determines \(\pmb{\sigma}\) within \(\Sigma_n\).
If an inequality of the form 7 holds for \(w_{n,c}\), then small data perturbations lead to bounded reconstruction error.
It is worth noting that well-posedness in the infinite-dimensional setting does not directly translate to the finite-dimensional case. The properties of \(w\) in 5 are only suggestive; they do not guarantee corresponding properties for \(w_{n,c}\) in 8 .
Numerical implementation: Ultimately, we need to implement a computational scheme for the reconstruction. This task requires algorithmic pipelines that recover \(\pmb{\sigma}\) from observed data. Two broad approaches dominate: (i) PDE-constrained optimization methods [28], [29], and (ii) Bayesian inference frameworks [30], [31]. When prior information is available, regularization methods are widely used to stabilize reconstruction, including classical Tikhonov regularization [32], [33], total variation (TV) [34], and \(\ell_1\)-based sparsity methods [35], [36].
Throughout this paper, we assume that PDE-level well-posedness results 5 –7 are available. Our focus is on transferring these theoretical guarantees to the finite-dimensional setting, establishing such finite-dimensional counterparts as 8 .
It is tempting to transfer the infinite-dimensional well-posedness estimate 5 directly to its finite-dimensional counterpart 8 . The passage, however, is not straightforward. A conceptual illustration appears in Figure 1.
At the origin \((n,c) = (\infty,\infty)\), the problem is posed in the infinite-dimensional setting, where well-posedness is assumed to hold. The most natural route to the fully finite-dimensional case proceeds first along the green arrow, followed by the red arrow.
The green arrow. This step reduces the problem to a semi-infinite setting: the unknown parameter is finite-dimensional, while data remains abundant. This reduction is essentially free. Once well-posedness 5 is established in the infinite-dimensional formulation, uniqueness readily follows in the semi-infinite case, with \[\label{eqn:well95posed95semifinite} |\pmb{\sigma}_1 - \pmb{\sigma}_2| \leq w_n\left(|\pmb{\mathcal{M}}(\cdot, \pmb{\sigma}_1) - \pmb{\mathcal{M}}(\cdot, \pmb{\sigma}_2)|\right), \quad \text{for } \sigma \in \Sigma_n \cap B_{R}(\pmb{\sigma}_\ast),\tag{9}\] for some function \(w_n\). Since the amount of data used in the same, but the to-be-reconstructed parameter \(\pmb{\sigma}\) now lives in a smaller space: \(\Sigma_n\subset\Sigma\), the stability function \(w_n\) inherits \(w_n(0)=0\) from \(w\). Moreover, because \(\Sigma_n\) is finite dimensional and the analysis is restricted to a neighborhood of the ground truth, the domain under consideration is essentially compact, and the mapping is automatically Lipschitz [37].
Well-posedness in this semi-infinite regime has been studied extensively. For example, under the a priori assumption of piecewise-constant parameters, Alessandrini and Vessella [38] established Lipschitz stability for the Calderón problem [11]. Subsequent work has extended these results to more general settings [39], [39]–[45], as well as to Schrödinger and Helmholtz equations [46]–[48]. However, the Lipschitz stability constants may grow potentially exponentially with the parameter dimension \(n\) [49], [50].
The red arrow. In this (far more challenging) step, the available data is reduced from unlimited (\(c=\infty\)) to a finite \(c\), while one seeks to maintain a comparable quality of reconstruction. This reduction raises two fundamental questions.
What is the minimal number of experiments \(c\) required to guarantee identifiability?
How should the subset \(\Theta_c\subset\Theta\) be chosen?
These questions lie at the heart of data selection, which is essential for bridging continuous formulations with practical, finite data. Addressing them is central to PDE-based inverse problems and is the primary focus of this review.
Finally, one may also define a particular structure for \(\Theta_c\) and examine only the scaling of \(c\) in terms of \(n\). This approach was taken in a recent paper [51], where \(\Theta_c\) is assumed to be a \(c\)-dimensional projection with a generic assumption.
The task of selecting data from a potentially infinite pool of experimental configurations is classical, and many approaches have been developed to address it. Broadly speaking, these approaches can be grouped into two categories.
Quantitative design. This perspective is rooted in the Bayesian framework of optimal experimental design. Here, “design" refers to selecting, from all possible configurations, the experiments that most effectively extract information about the unknown parameter \(\pmb{\sigma}\). A quantitative design requires specifying a criterion \(\Phi\) that measures the information content of a chosen subset \(\Theta_c\subset\Theta\), then choosing the design that optimizes \(\Phi\), that is \[\Theta_c^* = \arg \min_{\Theta_c\subset \Theta} \Phi(\Theta_c)\,.\] Over the past decades, a wide variety of criteria \(\phi\) and associated computational strategies have been developed. Among them, criteria based on minimizing uncertainty are particularly prominent: one models perturbations in the data and studies how they propagate into the reconstruction of \(\pmb{\sigma}\), then selects experiments that minimize the resulting uncertainty. See Section 3.2 for further details. For broader treatments of experimental design we refer to classical solvers [52]–[55] and to the summaries in [4], [5], [56], [57]. Design used in PDE-based inverse problems context is reviewed in [6], [7], [58].
Qualitative design. More recent approaches move away from strict optimality criteria. Instead, the goal is to secure reconstructions of reasonable quality [59], [60]. This relaxed perspective broadens the toolbox: techniques from randomized numerical linear algebra, gradient flows, and related areas that are traditionally viewed as separate from optimal design can now be applied effectively.
While quantitative design has long been the dominant paradigm, qualitative design has recently gained traction, particularly in connection with randomized numerical linear algebra (RNLA). As we discuss in Section 4, RNLA prioritizes efficiency while tolerating modest losses in precision. This philosophy aligns naturally with qualitative design, where “good enough" data selection suffices for practical purposes. For this reason, we are chiefly interested in qualitative approaches in this review.
Several inverse problems arising from physical and biological applications have become canonical in the study of PDE-based inverse problems. We briefly summarize five such examples here. They are the inverse Lorenz 63 system, the inverse elliptic equation (also known as electrical impedance tomography, or the Calderón problem), the inverse Schrödinger equation, the inverse transport equation (as in optical tomography), and the inverse Helmholtz equation (arising in acoustic imaging).
Inverse Lorenz 63 system. The Lorenz 63 system [61], originally introduced as a simplified model of atmospheric convection, describes the temporal evolution of three state variables: \[\begin{cases} \dot{x} = \sigma(y - x)\,, \\ \dot{y} = x(\rho - z) - y\,, \\ \dot{z} = xy - \beta z\,. \end{cases}\] Here, \((x,y,z)\) correspond to convective flow rate, horizontal temperature difference, and vertical deviation of the temperature profile, while the parameter vector \(\pmb{\sigma}=(\sigma,\rho,\beta)\) encodes the Prandtl number, the Rayleigh number, and a geometric factor. Owing to its chaotic dynamics and bifurcation structure, the Lorenz system is a classical model in nonlinear dynamics.
In the inverse setting, given partial or full time-series observations of \((x,y,z)\), the task is to reconstruct the parameter vector \(\pmb{\sigma}\). The associated ItO map is: \[\Lambda_{\pmb{\sigma}} : (x, y, z)(t=0) = \theta_1 \mapsto (x, y, z)(t = \theta_2)\,.\] Parameter identification for the Lorenz system has been widely studied; see, for instance, [62]–[64].
Inverse elliptic equation. Elliptic equations provide the foundation of steady-state modeling. A typical formulation is: \[\label{eqn:elliptic} \nabla \cdot (\pmb{\sigma}(x) \nabla u) = 0\,, \quad \text{with} \quad u|_{\partial \Omega} = \phi\,.\tag{10}\] The forward problem maps a prescribed Dirichlet boundary condition \(\phi\) to the solution \(u\). A canonical inverse problem is electrical impedance tomography (EIT) [65], a noninvasive medical imaging technique in which one seeks to infer the conductivity distribution \(\pmb{\sigma}\) of tissue from boundary voltage–current measurements. In this case, the measured Neumann data \(\pmb{\sigma}\partial_n u\) defines the Dirichlet-to-Neumann (DtN) map: \[\Lambda_{\pmb{\sigma}} : \phi = \phi_{\theta_1} \mapsto \pmb{\sigma}\, \partial_n u(\theta_2)\,.\] Physically, this corresponds to the voltage-to-current relation. The EIT problem has a rich history; see surveys [22], [66]–[68]. Foundational works established uniqueness and stability results [12], [13], [17], [18], [69].
Inverse Schrödinger equation The Schrödinger equation is a fundamental model for describing wave fields in quantum systems. Its steady-state form is a semilinear elliptic equation: \[\label{eqn:schroedinger} \Delta_x u + \pmb{\sigma}(x)u = 0\,, \quad \text{with} \quad u|_{\partial \Omega} = \phi\,.\tag{11}\] where \(u\) is the wave field and \(\pmb{\sigma}\) the potential. The inverse problem seeks to reconstruct \(\pmb{\sigma}\) from boundary measurements of \(u\). Applications include material characterization and design for achieving prescribed properties [70]–[72]. The corresponding boundary map is: \[\Lambda_{\pmb{\sigma}}: u|_{\partial\Omega} \to \partial_n u\big|_{\partial\Omega}\,.\] From a mathematical perspective, the inverse Schrödinger problem can be related to EIT: Under smoothness assumptions on the medium, the Schrödinger potential (\(\pmb{\sigma}\) in 11 ) can be written as \(\tfrac{\Delta \sqrt{\pmb{\sigma}}}{\sqrt{\pmb{\sigma}}}\) from 10 , linking the two formulations [17], [22].
Inverse transport equation. Transport equations are widely used in optical and radar imaging. The underlying principle is to infer the optical properties of a medium from measurements of incoming and outgoing photon fluxes [73]–[76]. These material properties are encoded in the coefficient \(\pmb{\sigma}\). The governing PDE is \[v \cdot \nabla_x f = \pmb{\sigma}(x) \, \mathcal{L}[f]\,, \quad \text{with} \quad f|_{\Gamma_-} = \phi(x, v)\,,\] where \(f(x, v)\) denotes the photon distribution at spatial location \(x\) and velocity \(v\), and \(\mathcal{L}\) is a prescribed scattering operator. The domain is \(x \in \Omega\), with \(\Gamma_\pm = \{ (x, v) : x \in \partial \Omega, \pm v \cdot n_x > 0 \}\) representing inflow and outflow boundaries. The associated inverse problem seeks to reconstruct \(\pmb{\sigma}\) from boundary measurements, which defines the ItO map: \[\Lambda_{\pmb{\sigma}} : \phi_{\theta_1} \mapsto f|_{\Gamma_+}(\theta_2)\,.\] Inverse transport problems are fundamental in optical tomography and have been studied extensively from both analytical and computational perspectives [26], [77]–[81].
Inverse Helmholtz equation. The Helmholtz equation models time-harmonic wave propagation and forms the foundation of acoustic imaging, with applications in geophysics [82]–[84] and medical imaging [85]. The PDE takes the form \[\Delta_x u + \omega^2 \pmb{\sigma}(x) u = 0\,,\] where \(\omega\) is the prescribed frequency. The incoming wave is specified as a plane wave \(u^{\text{in}}_{\theta_1} = e^{i \omega \theta_1 \cdot x}\). Measurements are collected on the boundary of a large domain (typically modeled as a ball of radius \(R\), denoted \(B_R\)). The inverse problem consists of reconstructing the coefficient \(\pmb{\sigma}\) using the ItO map: \[\Lambda_{\pmb{\sigma}} : u^{\text{in}}_{\theta_1} \mapsto u|_{\partial B_R}(\theta_2)\,.\] The inverse Helmholtz scattering problem has been the subject of extensive research from mathematical, statistical, and numerical perspectives. Works that summarize the progress and highlight key advances in theory and algorithm design include the books [1], [3], [86]–[89] and landmark works [89]–[95].
Experimental design is an upstream task that precedes the implementation and execution of inverse algorithms. The value of the collected data ultimately depends on how effectively it can be exploited by the chosen solver. Two major algorithmic pipelines have been particularly influential: PDE-constrained optimization and the Bayesian formulation, both of which we summarize in the following subsections. A recurring theme in both frameworks is the central role of linearization that requires gradient computation. We therefore devote Section 3.3 to surveying the linearization techniques.
When data is abundant, PDE-constrained optimization is one of the most widely used methods for inferring unknown parameters in PDEs. The formulation is straightforward: Subject to the PDE being satisfied, we seek the parameter \(\pmb{\sigma}\) that produces the best match to the measurement data: \[\label{eqn:PDE95constrained95opt} \min_{\pmb{\sigma}}\sum_{\theta_1,\theta_2}\frac{1}{2}|\mathrm{d}(\theta)-M_{\theta_2}[u_{\theta_1}]|^2 \,,\quad\text{s.t.} \quad \mathcal{L}_{\pmb{\sigma}}[u_{\theta_1}]=S_{\theta_1}\,.\tag{12}\] If the data \(\mathrm{d}\) is noiseless (as when it is generated by the PDE solution using a ground-truth parameter \(\pmb{\sigma}^*\)) and the problem is well-posed, then the optimizer is unique and yields a zero objective value. Since the PDE itself is well-posed, \(u_{\theta_1}\) is uniquely determined; we therefore write it as \(u(\pmb{\sigma};\theta_1)\) to emphasize its dependence on \(\pmb{\sigma}\). The optimization problem can then be recast in an unconstrained form: \[\min_{\pmb{\sigma}}L(\pmb{\sigma})=\frac{1}{2}\sum_{\theta_1,\theta_2}|\mathrm{d}(\theta)-M_{\theta_2}[u(\pmb{\sigma};\theta_1)]|^2=\frac{1}{2}\sum_{\theta_1,\theta_2}|\mathrm{d}(\theta)-\mathrm{pde}(\pmb{\sigma},\theta))|^2\,,\] where we adopt the shorthand \(\mathrm{pde}(\pmb{\sigma},\theta) = M_{\theta_2}[u(\pmb{\sigma};\theta_1)]\).
In practice, gradient-based solvers are most commonly used to minimize \(L(\pmb{\sigma})\), updating \(\pmb{\sigma}\) by making use of the gradient \(\frac{d}{d\pmb{\sigma}}L(\pmb{\sigma})\). The conditioning of the problem, which governs both stability and convergence properties, depends critically on the Hessian \(\mathrm{Hess}_{\pmb{\sigma}}L\). Calculation of the gradient and Hessian can be summarized as follows.
Gradient \(\frac{d}{d\pmb{\sigma}}L\). By the chain rule, we have \[\frac{d}{d\pmb{\sigma}}L = \sum_\theta \left(\mathrm{pde}(\pmb{\sigma},\theta)-\mathrm{d}(\theta)\right)\frac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta)\,.\] Since the dependence on \(\pmb{\sigma}\) appears only through \(\mathrm{pde}(\pmb{\sigma},\theta)\), computing the gradient amounts to evaluating \(\tfrac{d}{d\pmb{\sigma}}\mathrm{pde}\).
Hessian \(\mathrm{Hess}_{\pmb{\sigma}}L\). Differentiating once more gives \[\label{eqn:hessian95L} \mathrm{Hess}_{\pmb{\sigma}}L \;=\; \underbrace{\sum_{\theta}\frac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta)\otimes \frac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta)}_{\text{Fisher Information Matrix}} + \sum_{\theta}\bigl(\mathrm{pde}(\pmb{\sigma},\theta) - \mathrm{d}(\theta)\bigr)\,\mathrm{Hess}_{\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta)\,.\tag{13}\] The Hessian thus consists of two components. The first is the outer product of the pde gradients, usually referred to as the Fisher Information Matrix (FIM) or the Gauss-Newton matrix. Near the ground truth, where \(\mathrm{pde}(\pmb{\sigma},\theta)\approx \mathrm{d}(\theta)\), the second term vanishes, leaving the Hessian dominated by the FIM. Note that when \(\pmb{\sigma}\in \Sigma\), a function space, the Hessian is an operator, whereas for \(\pmb{\sigma}\in \Sigma_N\) it reduces to a finite-dimensional matrix in \(\mathbb{R}^{N\times N}\).
Both the gradient and the Fisher Information Matrix (FIM) will reappear as key quantities in experimental design, since they provide the sensitivity information that guides the selection of data so as to maximize its impact on parameter inference.
Bayes’ formula provides a widely used alternative to PDE-constrained optimization, particularly when we need to quantify uncertainty of reconstruction. Instead of seeking a single best-fit parameter, the Bayesian perspective treats a range of parameters as plausible, each weighted by its posterior probability.
In this framework, prior knowledge is blended with observed data. For PDE-based inverse problems, we model the data as \[\mathrm{d}(\theta)=\mathrm{pde}(\pmb{\sigma},\theta)+\eta(\theta)\,,\] where \(\eta\) represents measurement error. The posterior distribution of \(\pmb{\sigma}\) — which is also the conditional distribution of \(\pmb{\sigma}\) given the data \(\mathrm{d}\) — takes the form \[p^\mathrm{d}(\pmb{\sigma}) = p(\pmb{\sigma}|\mathrm{d})\propto p(\mathrm{d}|\pmb{\sigma})p(\pmb{\sigma}) = p_\eta\left(\mathrm{d}(:)-\mathrm{pde}(\pmb{\sigma},:)\right)p(\pmb{\sigma})\,,\] where \(p(\pmb{\sigma})\) is the marginal or prior distribution distribution and \(p_\eta\) is the noise distribution.
As in optimization, the posterior distribution has a few key quantities, one of which is the variance, which measures the uncertainty of the reconstruction. In general, the variance is difficult to compute for nonlinear PDE operators. However, in the linearized setting, the posterior distribution is well understood. The following assumptions are standard.
The PDE operator is linear, e.g. \(\mathrm{pde}(\pmb{\sigma},\theta)=\mathbf{A}_{\theta,:}\pmb{\sigma}\), or locally linearizable around \(\pmb{\sigma}^*\) as \(\mathrm{pde}(\pmb{\sigma},\theta)\approx \mathbf{A}_{\theta,:}\left(\pmb{\sigma}-\pmb{\sigma}^*\right)+\mathrm{pde}(\pmb{\sigma}^*,\theta)\), with data adjusted accordingly.
Noise is Gaussian: \(\eta\sim\mathcal{N}(0,\Gamma)\);
The prior is Gaussian: \(\pmb{\sigma}\sim\mathcal{N}(\pmb{\sigma}^\dagger,\Gamma_0)\).
Under these assumptions, the posterior remains Gaussian. When the data \(\mathrm{d}(\theta)\) is assigned weights \(w_\theta\), the posterior distribution is \[\label{eq:Bayesian32inversion} p^\mathrm{d}(\pmb{\sigma})\sim\mathcal{N}(\hat{\pmb{\sigma}}_W, \hat{\Gamma}_W)\, \quad \text{with}\quad \hat{\Gamma}_W =(\mathbf{A}^\top W^{1/2}\Gamma^{-1}W^{1/2}\mathbf{A}+\Gamma_0^{-1})^{-1}\,,\tag{14}\] and the mean is \(\hat{\pmb{\sigma}}_W = \hat{\Gamma}_W\left(\mathbf{A}^\top W^{1/2} \Gamma^{-1} W^{1/2} \mathrm{d} + \Gamma_0^{-1} \pmb{\sigma}^\dagger\right)\). Here \(W=\text{diag}\{w_{\theta}\}\). Smaller posterior variance \(\hat{\Gamma}_W\) corresponds to greater certainty and hence more accurate reconstruction. Thus, experimental design is done in a way that minimizes some scalar measure of \(\hat{\Gamma}_W\).
Because \(\hat{\Gamma}_W\) is a matrix, different scalarizations yield different design criteria. The following are most prominent.
E-optimal design: \(\Phi_E = \lambda_{\max}(\hat{\Gamma}_W)\).
D-optimal design: \(\Phi_D(\theta) = \det(\hat{\Gamma}_W)\).
A-optimal design: \(\Phi_A(\theta) = \trace(\hat{\Gamma}_W)\).
The experimental design problem is to select weights \(\{w_\theta\}\) that minimize one of these criteria: \[\label{eqn:optimal95design} \min_{\{w_\theta\}} \;\Phi_{E,D,A}\,,\tag{15}\] assignoing higher weights to data most effective at reducing uncertainty. In the semi-infinite case, where \(\pmb{\sigma}\in \Sigma_N\) and \(\hat{\Gamma}_W \in \mathbb{R}^{N\times N}\), the sequence \({w_\theta}\) may still be infinite in size. Hence, the optimization is conducted over the entire design function space \(\Theta\).
Bayesian formulation, together with the associated design principle 15 based on uncertainty reduction, is one of the leading approaches to data selection. As in the optimization framework, the matrix \(\mathbf{A}\), which captures the linear component of \(\mathrm{pde}(\pmb{\sigma},\theta)\), plays a central role in quantifying the information content of the data \(\theta\).
Linearization plays a central role in PDE-based inverse problems. As seen above, derivatives such as \(\tfrac{d}{d\pmb{\sigma}}\mathrm{pde}\) arise repeatedly in both optimization and Bayesian formulations. Theoretically, linearization is also tied to the notation \(B_R(\pmb{\sigma}_\ast)\) in the well-posedness estimates 5 , 9 , and 8 . In many cases, well-posedness can be guaranteed only within a neighborhood of the ground truth \(\pmb{\sigma}_\ast\). This locality is especially relevant in the finite-dimensional setting: Close to two different underlying truth \(\pmb{\sigma}_{*,1}\) and \(\pmb{\sigma}_{*,2}\), the same measurement \(\mathrm{d}(\theta_1,\theta_2)\) may exhibit very different sensitivities to parameter variations. Consequently, the type and amount of data needed for stable reconstruction depend on the specific ground truth \(\pmb{\sigma}_\ast\).
To address this issue, one must examine how the data \(\mathrm{d}(\theta_1,\theta_2)\) varies with perturbations of the parameter \(\pmb{\sigma}\) near a given \(\pmb{\sigma}_\ast\). Linearization and functional gradient computations provide precisely this information [96]. For PDE-based inverse problems, these tools can be formulated systematically using the calculus of variations. Following the notation of 1 , the linearized problem typically reduces to a Fredholm integral equation of the first kind: \[\label{eqn:linearization95general} \left\langle\frac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta),\delta\pmb{\sigma}\right\rangle=\left\langle\frac{d}{d\pmb{\sigma}}M_{\theta_2}[u(\pmb{\sigma};\theta_1)],\delta\pmb{\sigma}\right\rangle = \delta \mathrm{d}_{\theta}\,,\tag{16}\] where \[\delta\pmb{\sigma}=\pmb{\sigma}-\pmb{\sigma}_\ast \quad\text{and}\quad \delta \mathrm{d} = \mathrm{d}(\theta)-M_{\theta_2}[u_{\theta_1}(\pmb{\sigma}_\ast)]=\mathrm{d}(\theta)-\mathrm{pde}(\pmb{\sigma}_*,\theta)\,.\] The inverse problem in this linearized form is to use pairs of \(\big(\tfrac{d}{d\pmb{\sigma}}M[u(\pmb{\sigma};\theta)]\,, \delta \mathrm{d}\big)\), collected over many \(\theta\), to reconstruct \(\delta\pmb{\sigma}\).
While the computation of \(\delta \mathrm{d}\) is straightforward, the central challenge lies in evaluating the functional gradient \(\tfrac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta) = \tfrac{d}{d\pmb{\sigma}}M_{\theta_2}[u(\pmb{\sigma};\theta_1)]\), subject to the PDE constraint \(\mathcal{L}_{\pmb{\sigma}}[u]=S_{\theta_1}\). Different PDEs yield different functional gradients, but a common structure emerges: \[\frac{d}{d\pmb{\sigma}}M_{\theta_1}[u(\pmb{\sigma};\theta_2)]=f_{\theta_1}\cdot g_{\theta_2}\,\] where \(f_{\theta_1}\) is associated with the forward PDE solution \(u\) (driven by source \(\theta_1\)), and \(g_{\theta_2}\) is associated with the adjoint PDE solution that is driven by detector \(\theta_2\). Bu substituting this form into 16 , renaming \(\mathbf{b}:= \delta \mathrm d\), and denoting \(\pmb{\sigma}\) as \(\delta \pmb{\sigma}\), we obtain the following compact expression: \[\label{eq:summary95linearized} \left\langle f_{\theta_1}g_{\theta_2},\, \pmb{\sigma}\right\rangle = \mathbf{b}(\theta_1,\theta_2)\,.\tag{17}\] This tensorized structure in the Fredholm integral is a distinctive feature of PDE-based inverse problems, and it will serve as the foundation for the derivations that follow. While linearization itself is often treated as a black-box solver and may be of secondary importance for the purposes of this review, the tensorized structure plays a central role in data selection. In particular, effective designs must be compatible with this structure, a requirement that distinguishes PDE-based inverse problems from other experimental design settings.
We now describe the main techniques for computing functional gradients. (Readers already familiar with adjoint-based methods may skip this discussion.) For simplicity, we fix \(\theta=(\theta_1,\theta_2)\) throughout this subsection and omit the subscript.
Linearized PDE operator via calculus of variations. The general recipe is straightforward. Given the PDE constraint \(\mathcal{L}_{\pmb{\sigma}}[u]=S\), we regard \(u=u(\pmb{\sigma})\) as a function of the parameter \(\pmb{\sigma}\). Differentiation of the observation functional then yields \[\label{eqn:derivative95sigma} \frac{d}{d\pmb{\sigma}} M[u(\pmb{\sigma})] = \left\langle \frac{\delta M}{\delta u} \,, \frac{du}{d\pmb{\sigma}} \right\rangle.\tag{18}\] Typically, the variational derivative \(\frac{\delta M}{\delta u}\) is explicit; the main difficulty lies in evaluating \(\frac{du}{d\pmb{\sigma}}\).
To compute this term, we differentiate the PDE constraint. Since \(\mathcal{L}_{\pmb{\sigma}}[u(\pmb{\sigma})]=S\) holds identically, we have \[\left.\frac{d\mathcal{L}_{\pmb{\sigma}}}{d\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})} = \left.\frac{\partial \mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})} + \left.D_u\mathcal{L}_{\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})} \cdot \frac{du}{d\pmb{\sigma}} = 0\,,\] or equivalently, \[\left.D_u\mathcal{L}_{\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})} \cdot \frac{du}{d\pmb{\sigma}} = -\left.\frac{\partial \mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})}.\] Here both \(D_u\mathcal{L}_{\pmb{\sigma}}\) and \(\tfrac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\) are explicit from the PDE. Direct inversion of \(D_u\mathcal{L}_{\pmb{\sigma}}\), however, is often prohibitively expensive.
A standard alternative is the adjoint method. Returning to 18 , we introduce an adjoint variable \(\lambda\) defined by \[\label{eqn:adjoint} \left(D_u\mathcal{L}_{\pmb{\sigma}}\right)^\ast \lambda = \frac{\delta M}{\delta u}\,.\tag{19}\] Multiplying the differentiated PDE constraint by \(\lambda\) and integrating by parts yields: \[\label{eqn:functional95derivative95final} \begin{align} \frac{d}{d\pmb{\sigma}} M[u(\pmb{\sigma})] &= \left\langle \frac{\delta M}{\delta u}, \frac{du}{d\pmb{\sigma}} \right\rangle = \left\langle \left(D_u\mathcal{L}_{\pmb{\sigma}}\right)^\ast \lambda, \frac{du}{d\pmb{\sigma}} \right\rangle \\ &= \left\langle \lambda, D_u\mathcal{L}_{\pmb{\sigma}} \cdot \frac{du}{d\pmb{\sigma}} \right\rangle = -\left\langle \lambda, \left.\frac{\partial \mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\right|_{\pmb{\sigma}, u(\pmb{\sigma})} \right\rangle. \end{align}\tag{20}\] This adjoint formulation is highly efficient: only one forward PDE solve and one adjoint PDE solve 19 are required, independent of the dimension of \(\pmb{\sigma}\).
Since \(M[u(\pmb{\sigma})]\) maps a function \(\pmb{\sigma}\) to a scalar, its derivative with respect to \(\pmb{\sigma}\) is itself a function. Consequently, the terms \(\frac{du}{d\pmb{\sigma}}\) and \(\frac{\partial \mathcal{L}_{\pmb{\sigma}}}{\partial \pmb{\sigma}}\) in 20 should be interpreted as operators, with the inner product taken in the operator–function sense. This viewpoint naturally motivates a projected formulation, which is often more convenient in practice. We have \[\label{eqn:functional95proj} \Big\langle \tfrac{d}{d\pmb{\sigma}} M[u(\pmb{\sigma})], \delta\pmb{\sigma}\Big\rangle = -\left\langle \lambda, \tfrac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}[\delta\pmb{\sigma}] \right\rangle,\tag{21}\] where \(\tfrac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}[\delta\pmb{\sigma}]\) denotes the action of \(\tfrac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\) on \(\delta\pmb{\sigma}\). This projected formulation avoids explicit construction of the full functional derivative and directly yields directional gradients, making it highly practical.
Example 1 (Linearized EIT [itm: E2]). Consider the elliptic PDE \(\mathcal{L}_{\pmb{\sigma}}[u] = \nabla \cdot \big(\pmb{\sigma}(x) \nabla u\big)\) corresponding to 10 . The ingredients of the functional derivative are as follows.
Operator derivative \(\frac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\). By definition, we have \[\left.\frac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\right|_{\pmb{\sigma},u}[\delta\pmb{\sigma}] = \left.\frac{d}{d\epsilon} \mathcal{L}_{\pmb{\sigma}+\epsilon\delta\pmb{\sigma}}[u]\right|_{\epsilon=0},\] where the expression is evaluated at \((\pmb{\sigma}, u)\) and applied in the direction \(\delta\pmb{\sigma}\). Observing that \[\mathcal{L}_{\pmb{\sigma}+\epsilon\delta\pmb{\sigma}}[u] = \nabla\cdot\big((\pmb{\sigma}+\epsilon\delta\pmb{\sigma})\nabla u\big) = \nabla\cdot(\pmb{\sigma}\nabla u) + \epsilon\,\nabla\cdot(\delta\pmb{\sigma}\nabla u),\] we obtain \[\left.\frac{\partial\mathcal{L}_{\pmb{\sigma}}}{\partial\pmb{\sigma}}\right|_{\pmb{\sigma},u}[\delta\pmb{\sigma}] = \nabla\cdot(\delta\pmb{\sigma}\nabla u).\]
Fréchet derivative in \(u\), \(D_u\mathcal{L}_{\pmb{\sigma}}\). We know that \[D_u\mathcal{L}_{\pmb{\sigma}}[\delta u] = \nabla\cdot(\pmb{\sigma}\nabla \delta u).\] This operator is self-adjoint, so we have \(\left(D_u\mathcal{L}_{\pmb{\sigma}}\right)^\ast \lambda = \nabla\cdot(\pmb{\sigma}\nabla \lambda)\).
By combining these results with 21 , we obtain \[\left\langle\frac{d}{d\pmb{\sigma}}M[u(\pmb{\sigma})]\,,\delta\pmb{\sigma}\right\rangle = -\left\langle \lambda,\, \nabla\cdot(\delta\pmb{\sigma}\nabla u) \right\rangle= \left\langle \nabla\lambda \cdot \nabla u,\, \delta\pmb{\sigma}\right\rangle.\] where we used integration-by-parts. By reinstating the dependence on \(\theta=(\theta_1,\theta_2)\), we have \[\label{eq:linear32eit32cont} \tfrac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta) = \nabla \lambda_{\theta_2} \cdot \nabla u_{\theta_1},,\tag{22}\] where \(u_{\theta_1}\) and \(\lambda_{\theta_2}\) are solutions of the forward and adjoint problems, respectively. The linearized inverse problem becomes \[\label{eq:EIT32linearized} \Big\langle \nabla\lambda_{\theta_2} \cdot \nabla u(\pmb{\sigma}^\ast;\theta_1), \delta\pmb{\sigma}\Big\rangle = \mathrm{d}(\theta) - M[u(\pmb{\sigma}^\ast;\theta_1)]\,.\tag{23}\]
This example highlights the decoupling of \(u\) and \(\lambda\) associated with source \(\theta_1\) and detector \(\theta_2\), respectively, in consistency with the tensorized structure discussed in 17 .
Randomized numerical linear algebra (RNLA) incorporates randomization into the design of algorithms for numerical linear algebra algorithms. In contrast to classical deterministic methods, randomized algorithms deliberately trade accuracy for efficiency. They are particularly effective for large-scale problems or for settings that require repeated linear operations, where computational cost rather than a need for high precision is the primary bottleneck. Early applications of RNLA arose in such data-science domains as imaging, signal processing, and internet-scale problems, where speed is of paramount importance [8], [9], [97]. More recently, applications in the physical sciences have emerged, where accuracy demands must be balanced against computational feasibility, making RNLA both powerful and practical.
PDE-based experimental design aligns naturally with this perspective. Inverse problems governed by PDEs ultimately prioritize reconstruction accuracy. Yet before reconstruction can take place, one must often examine a large pool of potential data to identify informative subsets. This selection step is typically repeated many times and, crucially, it does not require exact precision. RNLA algorithms are well-suited to this step, as they accelerate the repeated linear algebra computations underlying data selection while maintaining enough accuracy to guide the process.
A classical entry point for RNLA in experimental design is through leverage scores. Leverage scores are statistical quantities that measure the importance of each constraint in a linear regression problem, and they have long been used to guide data selection. However, they are expensive to compute exactly, which is why randomization becomes valuable. To illustrate, consider the linear regression problem: \[\min_{\pmb{\sigma}}\|\mathbf{A}\pmb{\sigma}-\mathbf{b}\|_2\,.\] The optimal solution and its projection are explicitly given by \[\pmb{\sigma}^*=(\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{b}\,,\quad\text{and}\quad\mathbf{b}^*=\mathbf{A}(\mathbf{A}^\top \mathbf{A})^{-1}\mathbf{A}^\top \mathbf{b}\,.\] Here \(\mathbf{A}\in\mathbb{R}^{C\times n}\) is the design matrix, with each row encoding one constraint, and \(\mathbf{b}\) collects the data. The influence of the data \(\mathbf{b}\) on the fitted output \(\mathbf{b}^*\) can be quantified through the Jacobian: \[\nabla_{\mathbf{b}}\mathbf{b}^*=\mathbf{A}(\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top\doteq \mathbf{H}\,,\] known as the hat matrix (as it is often, though not here, denoted by \(\hat{\mathsf{H}}\)). Its diagonal entries, \(l_i=\mathbf{H}_{ii}\) define the leverage scores, which measure the relative importance of the \(i\)-th observation. High-leverage points correspond to experiments with strong influence on the regression fit. Leverage scores are widely used, for example, in outlier detection in economics and medical imaging [98], [99], kernel methods [100], [101] and graph learning [102].
Computing \(\mathbf{H}\) presents challenges in computation and data access. Forming this matrix directly requires \(O(Cn^2)\) flops, which is costly for large problems. Even more problematic is the sample complexity: Evaluating leverage scores requires explicit knowledge of the entire matrix \(\mathbf{A}\), meaning all experiments must be conducted and formulated before any selection is possible. Randomized methods offer a way to address both challenges: They reduce the computational cost to near-linearity in \(C\) and \(n\) and allow approximate leverage scores to be estimated from only partial or sampled information about \(\mathbf{A}\) [8], [10], [103].
While these techniques were not originally developed for PDE-based inverse problems, they represent one of the earliest demonstrations of how RNLA can be useful for experimental design. Over the past decade, this perspective has broadened: Methods such as matrix sketching, randomized matrix–matrix multiplication, column selection, randomized singular value decomposition (SVD), compressed sensing, and matrix completion have all been adapted to PDE settings. Collectively, they provide ways to accelerate and guide data selection. In what follows, we first review these methods in their general linear algebraic form, and then describe how they can be specialized for PDE-based experimental design.
Matrix sketching is an important problem class in RNLA. Conceptually, it can be understood in analogy to regression: in classical least-squares regression, one may “sketch" by reducing the row dimension of the system in a principled way, hoping to obtain a solution that is nearly as good as the full system.
Formally, consider the overdetermined linear system \[\mathbf{A}\, \pmb{\sigma}\approx \mathbf{b}\,, \quad \mathbf{A}\in \mathbb{R}^{C\times n}\,,C\gg n\,,\] whose least-squares solution is \[\label{eq:linear32ls} \pmb{\sigma}^* = \arg\min_{\pmb{\sigma}\in \mathbb{R}^n} \| \mathbf{A}\pmb{\sigma}- \mathbf{b}\|_2^2 \,.\tag{24}\] Since the system is highly overdetermined, a reasonable approximation can often be obtained by using only a reduced set of rows from \(\mathbf{A}\). To achieve this, one introduces a sketching matrix \(\mathbf{S}\in \mathbb{R}^{c \times C}\) with \(c \ll C\) and solves the reduced problem: \[\label{eq:sketched32linear} \pmb{\sigma}^*_{\mathbf{S}} \;= \;\arg\min_{\pmb{\sigma}\in \mathbb{R}^n} \| \mathbf{S}\mathbf{A}\pmb{\sigma}- \mathbf{S}\mathbf{b}\|_2^2\,,\tag{25}\] with the goal that \(\pmb{\sigma}^*_{\mathbf{S}} \approx \pmb{\sigma}^*\). A typical guarantee is: \[\label{eqn:sketch95goal} \| \mathbf{A}{\pmb{\sigma}}^*_\mathbf{S}-\mathbf{b}\|_2^2 \leq (1+\epsilon)\| \mathbf{A}\pmb{\sigma}^* - \mathbf{b}\|_2^2, \quad \text{with~probability~}1-\delta.\tag{26}\] The mechanism is illustrated in Figure [fig:32sketching].
Figure 2: .
The advantages of sketching are twofold. From a computational perspective, solving 24 requires \(O(C n^2)\) flops, while 25 requires only \(O(\min(cn^2, c^2 n))\) with \(c \ll C\), yielding significant savings. From an experimental design perspective, the savings are even more pronounced in sample complexity: the full problem requires \(C\) measurements (the rows of \(\mathbf{A}\) and entries of \(\mathbf{b}\)), whereas the sketched problem reduces this to \(c\) measurements, substantially cutting the effort in data acquisition.
The main challenge is to construct a sketching matrix \(\mathbf{S}\) that ensures the guarantee 26 while maintaining the property \(c \ll C\). Without entering technical details, we highlight several widely used strategies (see [97], [104]):
SubGaussian matrices [105]. The sketching matrix is generated by i.i.d. zero-mean, isotropic subGaussian entries. Common subGaussian types include Gaussian, Bernoulli, uniform distributions.
Fast Johnson-Lindenstrauss transform (FJLT) [106], [107]. The sketching matrix is constructed as \(\pmb{\Pi}\pmb{\mathcal{F}},\) where \(\pmb{\mathcal{F}} \in \mathbb{R}^{C \times C}\) represents a discrete Fourier or Hadamard transform with randomly sign-flipped columns and \(\pmb{\Pi} \in \mathbb{R}^{c \times C}\) denotes a random row sampling operation.
Sparse sketching matrices [108], [109]. Particularly useful in streaming settings, these matrices have entries in \(\{0, +1, -1\}\), chosen according to a suitable distribution. Examples include CountSketch [109] and sparse random projection [108].
The three design principles are visualized in Figure 3.
While the precise dependence of \(c\) on \(\epsilon\) and \(\delta\) varies across these constructions, their theoretical justification is unified by the celebrated Johnson–Lindenstrauss lemma [110], which guarantees the existence of low-dimensional embeddings that preserve geometry with high probability.
Sketching aims to provide an almost equally good reconstruction to a regression problem using only a subsampled system. This perspective aligns naturally with experimental design. However, PDE-based inverse problems exhibit a special structure that prevents a direct use of classical sketching. Most notably, as shown in 17 , the linearized PDE-based inverse problem typically takes a tensorized form.
We rewrite the linearized system in algebraic form: \[\label{eq:eit32system} {\boldsymbol{A}} \cdot\pmb{\sigma}= {\boldsymbol{b}} \,,\tag{27}\] where \(\boldsymbol{b}\) encodes the right-hand side of 16 (with entries indexed by \(\theta\)), and \(\boldsymbol{A}\) is essentially the Jacobian matrix. Unlike in classical regression, \(\boldsymbol{A}\) is infinite in size: its rows are indexed by \(\theta=(\theta_1,\theta_2)\in\Theta\), the entire design space, while its columns are indexed by the discretization of \(\pmb{\sigma}\). Restricting to \(\pmb{\sigma}\in\Sigma_n\) with dimension \(n\), each row of \(\mathbf{A}_{\theta,:}\) lies in \(\mathbb{R}^n\).
The key distinction from standard regression is the tensorized structure of \(\mathbf{A}\). In numerical setting of sketching, we focus on the finite but possibly large data space as \(\Theta_C\). By 16 , \(\mathbf{A}\) can be expressed columnwise as a Khatri–Rao product. Let \[\mathbf{F}\in\mathbb{R}^{C_1\times n}\,, \quad \mathbf{G}\in\mathbb{R}^{C_2\times n}\quad\text{with}\quad\mathbf{F}_{\theta_1,:} = f_{\theta_1}\,,\quad\mathbf{G}_{\theta_2,:} = g_{\theta_2}\,,\] where \(\theta_i\in\Theta_i\) and \(|\Theta_i|=C_i\) and \(C=C_1C_2\). Then \[\mathbf{A}= \mathbf{F}\ast\mathbf{G}\in\mathbb{R}^{C_1C_2\times n}\quad\text{with}\quad \mathbf{A}_{\theta,:} = \mathbf{F}_{\theta_1,:}\mathbf{G}_{\theta_2,:}\,.\] This structure requires a tensor-aware sketching procedure, with the sketching matrix also having a row-wise tensorized form, so that \(\mathbf{F}\) and \(\mathbf{G}\) are sketched independently. Physically, this corresponds to the fact that the parameter specifying the PDE input (\(\theta_1\)) and the parameter specifying the detector (\(\theta_2\)) are independently configurable in a fully factorized design. Two representative constructions along these lines are the following.
SubGaussian tensorizing. Each row of the sketching matrix \(\mathbf{S}\) is written as \(\mathbf{S}_{i,:} = \mathbf{S}^{\mathbf{F}}_{i,:}\otimes \mathbf{S}^{\mathbf{G}}_{i,:}\), where \(\mathbf{S}^{\mathbf{F}}\) and \(\mathbf{S}^{\mathbf{G}}\) are filled with i.i.d. zero-mean isotropic subGaussian entries.
Kronecker FJLT. The transform \(\pmb{\Pi}\pmb{\mathcal{F}}\) is modified to the tensorized form \(\pmb{\Pi}\big(\pmb{\mathcal{F}}^{\mathbf{F}}\otimes\pmb{\mathcal{F}}^{\mathbf{G}}\big)\), so that \(\pmb{\mathcal{F}}^{\mathbf{F}}\) and \(\pmb{\mathcal{F}}^{\mathbf{G}}\) act separately on \(\mathbf{F}\) and \(\mathbf{G}\).
These mechanisms are illustrated in Figure [fig:32tensor32sketching].
Figure 4: .
Tensor-structured sketching has been extensively developed in recent years [111]–[115], often relying on the distributive property \((\boldsymbol{B}\otimes \boldsymbol{D})(C\otimes E) = (BC)\otimes (DE)\). In PDE-based inverse problems, the Khatri–Rao product induces the tensorized requirement, with notable works including [112], [114]. Likewise, [113] developed Kronecker Fourier-based random projections (Kronecker FJLT), extending standard FJLT.
Finally, we highlight the issue of sampling complexity: How many randomly sampled measurements are sufficient to ensure that the sketched solution \(\pmb{\sigma}^*_\mathbf{S}\) approximates the full solution? A trade-off emerges: The tensor structure accelerates computations but introduces higher-order Gaussian chaos, complicating probabilistic analysis. For instance, Theorem 1.1 in [114] establishes a nearly optimal bound: \[\label{eqn:32tensor32sample} c \geq \max \left(\mathcal{O} \left(\frac{\text{rank}^2({\boldsymbol{A}}) + \log^2 (1/\delta)}{\epsilon} \right), \mathcal{O} \left(\frac{\text{rank}({\boldsymbol{A}}) + \log (1/\delta)}{\epsilon^2} \right)\right),\tag{28}\] which ensures that \(\pmb{\sigma}^*_\mathbf{S}\) satisfies the guarantee 26 with high probability. The tensor sample complexity estimate 28 yields a slightly weaker bound compared to the optimal result regarding standard unstructured sketching [116], given by \(c \geq \mathcal{O} (\frac{\text{rank}(\mathbf{A}) + \log (1/\delta)}{\epsilon^2})\), which corresponds to the second term in 28 .
Due to the inherent source-detector structure in data arising from PDE inverse problems, tensor sketching is specially designed to efficiently process such structured data. Although great progress has been made in studying tensor sketching methods, several key open questions remain. A central question is to establish the optimal sample complexity bound, improving upon the sub-optimal result in 28 ). Another crucial question concerns the practical scenario in which measurements are sparse or missing. In such cases, it may be beneficial to design the sketching operation in a correspondingly sparse format, allowing the use of even fewer measurements while still preserving the essential information of the inference problem.
Matrix multiplication is one of the most fundamental tasks in numerical linear algebra. Given two matrices \(\mathbf{A}\in \mathbb{R}^{m \times C}\) and \(\mathbf{B} \in \mathbb{R}^{C \times n}\), we want to compute their product \[\mathbf{P}=\mathbf{A}\mathbf{B}\in\mathbb{R}^{m\times n}\,,\] for which the formula is \(\mathbf{P}_{ij} = \mathbf{A}_{i,:}\mathbf{B}_{:,j} = \sum_k\mathbf{A}_{ik}\mathbf{B}_{kj}\) for every entry \((i,j)\). Randomization enters through an alternative but equivalent representation of the product: \[\label{eqn:matrix95matrix95product} \mathbf{P}=\sum_{k=1}^C\mathbf{A}_{:,k}\mathbf{B}_{k,:} =\mathbb{E}_k\left(\alpha_k\mathbf{P}_{k}\right)\,,\tag{29}\] where \(\mathbf{P}_k = \mathbf{A}_{:,k}\mathbf{B}_{k,:}\) is a rank-1 matrix formed from the outer product of \(k\)-th column of \(\mathbf{A}\) and the \(k\)-th row of \(\mathbf{B}\). The expectation is taken over a random index \(k\), sampled according to a distribution \({1/\alpha_k}\) chosen by the user.
This rewriting expresses \(\mathbf{P}\) as the expectation of random rank-1 matrices. Consequently, it suggests a Monte Carlo (MC) approximation: \[\label{eqn:matrix95matrix95MC} \mathbf{P}= \mathbb{E}_k(\alpha_k \mathbf{P}_{k}) \;\approx \;\frac{1}{c}\sum_{i=1}^c \alpha_{k_i}\mathbf{P}_{k_i},\tag{30}\] where \(\{k_i\}_{i=1}^c\) are i.i.d. samples drawn from the distribution \(\{1/\alpha_j\}_j\). Equivalently, the approximation can be cast in linear algebra form: \[\label{eqn:matrix95matrix95MC95S} \mathbf{P}= \mathbf{A}\mathbf{B}\;\approx \;\mathbf{A}\mathbf{S}^\top \mathbf{S}\mathbf{B},\tag{31}\] where \(\mathbf{S}\in \mathbb{R}^{c \times C}\) is a sampling matrix with rows \[\label{eqn:matrix95matrix95MC95samplingMatrix} \mathbf{S}_{i,:} = \sqrt{\alpha_{k_i}}, {\boldsymbol{e}}_{k_i}^\top, \qquad k_i \sim \{1/\alpha_j\}_j\,.\tag{32}\] See the sampling illustration in [fig:32hess32sampling1] below.
Figure 5: .
The quality and efficiency of this randomized solver depend on the choice of probabilities \({1/\alpha_k}\) and the number of samples \(c\). Drineas et al.[117] showed that the optimal configuration is achieved when \[\label{eqn:best95conf95mm95product} \alpha_k\propto |\mathbf{A}_{:,k}||\mathbf{B}_{k,:}| \,,\qquad\text{and}\qquad c=O\left( \frac{1+\log{\delta^{-1}}}{\epsilon^2}\right)\tag{33}\] which guarantees that the Monte Carlo approximation@eq:eqn:matrix95matrix95MC satisfies \[\label{eqn:matrix95matrix95MC95accuracy} |\mathbf{P}- \mathbf{A}\mathbf{S}^\top \mathbf{S}\mathbf{B}|_F < \epsilon \qquad \text{with probability at least } 1-\delta.\tag{34}\] This randomized viewpoint is mathematically elegant: the matrix product is reconstructed as the average of randomly sampled rank-1 contributions. In practice the method has computational benefits only when the required number of samples \(c\) is much smaller than \(C\).
PDE-based experimental design provides a natural example of an effective application of the randomization approach for computing matrix-matrix product, through the computation of the Fisher Information Matrix (FIM) 13 . Recall from 13 that, in regions where \(\pmb{\sigma}\) is close to the ground truth, the Hessian matrix of the loss functional is well approximated by the FIM. A distinctive feature of this matrix is that it decomposes as a sum of many rank-1 contributions: \[\label{eq:Hessian} \mathrm{Hess}_{\pmb{\sigma}} L\approx {\boldsymbol{H} }= \mathbf{A}^\top \mathbf{A}= \sum_{\theta} \mathbf{A}_{\theta,:}^\top\mathbf{A}_{\theta,:}\,.\tag{35}\] Here, each row \(\mathbf{A}_{\theta,:}\) corresponds to the Fréchet derivative \(\frac{d}{d\pmb{\sigma}}\mathrm{pde}(\pmb{\sigma},\theta)\) associated with a particular design \(\theta\). In the semi-infinite setting with \(\pmb{\sigma}\in \Sigma_n\) and \(\theta\in\Theta\), this makes \(\mathbf{A}\) a matrix with \(n\) columns and infinitely many rows, a scenario that fits seamlessly into the randomized matrix–matrix product framework of 30 .
Following the sampling formulation 31 32 , one selects a subset of designs \(\Theta_c \subset \Theta\) to construct a down-sampled Hessian: \[\mathbf{H}_S = \mathbf{A}^\top \mathbf{S}^\top \mathbf{S}\mathbf{A}\,,\] where \(\mathbf{S}\) encodes the randomized sampling. According to 33 , the optimal strategy is to sample in proportion to \(|\mathbf{A}_{\theta,:}|^2\), i.e., the contribution of each design to the FIM.
A corollary of the analysis in [118], as applied in [119], guarantees that with high probability: \[\lambda(\mathbf{H}_S)>\lambda(\mathbf{H}) -\epsilon \quad \text{with probability} \geq 1-\delta\,,\] where \(\lambda\) denotes the smallest eigenvalue. This bound ensures the positivity of the sketched Hessian and thereby the well-posedness of the subsampled optimization problem.
This randomized FIM approximation has been applied, for instance, to the linearized stationary Schrödinger potential reconstruction problem ([itm: E3]). The results demonstrate that the optimal sensor locations are highly sensitive to the underlying ground truth \(\pmb{\sigma}_\ast\), as illustrated in 6.
This sensitivity is not a desired feature for reconstruction, and can potentially be overcome by a natural extension of this perspective to a sequential selection of experimental designs that involves feedback from previous experiments, as described in 5.1, which is left for future work. Moreover, a combination with a matrix completion approach in 4.6 promises enhanced reconstruction efficiency and may help bridge the gap between the semi-infinite and finite data regime.
Figure 6: Loss landscapes \(L(\pmb{\sigma})\) for two different ground truth parameters \(\pmb{\sigma}_\ast(x)\) (left column) under full data (middle left), and sampled data (right column) according to the optimal sampling distribution (middle right)..
Identifying a well-conditioned subset of columns from a large, wide matrix is another fundamental question in numerical linear algebra: Given a “dictionary" matrix \(\mathbf{A}\in \mathbb{R}^{c \times N}\) with \(c \leq N\), how can one select columns from \(\mathbf{A}\) to form a submatrix \(\mathbf{A}_\mathbf{S}\) so that \(\mathbf{A}_\mathbf{S}\) is well-conditioned? This problem is central to many applications in data science, including sparse signal and image reconstruction.
Mathematically, the selection process can be described using a sampling operator \(\mathbf{S}\), and the conditioning is evaluated via the Gram matrix \[\label{eqn:gram95column95selection} {\boldsymbol{H}}_\mathbf{S}\;=\; \mathbf{S}^\top \mathbf{A}^\top \mathbf{A}\mathbf{S}\;=\; \mathbf{A}_\mathbf{S}^\top \mathbf{A}_\mathbf{S}\;\in \mathbb{R}^{n \times n}\,, \quad Where \mathbf{A}_\mathbf{S}:= \mathbf{A}\mathbf{S}.\tag{36}\] The goal is to design \(\mathbf{S}\) so that \(\text{cond}({\boldsymbol{H}}_\mathbf{S}) = \frac{\lambda_{\max}({\boldsymbol{H}}_\mathbf{S})}{\lambda_{\min}({\boldsymbol{H}}_\mathbf{S})}\) is small, where \(\lambda_{\max,\min}\) denote the extreme eigenvalues. See illustration in Fig. [fig:32hess32sampling].
Figure 7: .
Randomization provides powerful tools for this problem. A key result of [120] gives probabilistic guarantees on conditioning when columns are sampled uniformly. Assuming the columns of \(\mathbf{A}\) are normalized, the number of columns \(n\) that can be selected while retaining good conditioning satisfies \[\label{eqn:32column32sample1} n \;\leq\; \min\!\left( o\!\left(\frac{1}{\mu^2 \log c}\right), \;\; o\!\left(\frac{N}{\| {\boldsymbol{H}}\|_2} \right) \right),\tag{37}\] with high probability, where \[\label{eqn:32condition32concentration1} \mathbb{P}\!\left( \text{cond} ({\boldsymbol{H}}_\mathbf{S}) \;\leq\; \frac{1 + \tau(n)}{1 - \tau(n)} \right) \;\geq\; 1 - \frac{1}{c}.\tag{38}\] Here, \(\mu = \max_{i \neq j} \vert \langle \mathbf{A}_{:, i}, \mathbf{A}_{:, j} \rangle \vert\) is the coherence between columns, and \(\tau(n)\) is a distortion parameter scaling as \(\tau(n) = \mathcal{O}\!\left( \sqrt{\mu^2 n \log c} \;+\; n\frac{\|{\boldsymbol{H}} \|_2}{N} \right)\).
The coherence \(\mu\) plays a decisive role: If \(\mu\) is large, columns are highly overlapping, and only a relatively small number of them can be selected while preserving conditioning. Therefore, smaller \(\mu\) enlarges the allowable range of \(n\), consistent with the intuition that independence between columns enables more extensive sampling without loss of stability. The second term \(\mathcal{O}\left(\tfrac{N}{\|{\boldsymbol{H}}\|_2}\right)\) can be misleading. Since \(N = \mathrm{Tr}({\boldsymbol{H}}) = \sum_i \lambda_i({\boldsymbol{H}}) \;\leq\; c \,\|{\boldsymbol{H}}\|_2\), the ratio \(\tfrac{N}{\|{\boldsymbol{H}}\|_2}\) is controlled by the matrix rank \(c\) and may not provide an informative bound in practice. In summary, the sampling dimension \(n\) 37 scales linearly with the matrix rank \(c\) and is influenced by the factor of \(1/\mu^2.\)
As discussed in 13 and Section 4.2, the Hessian of the objective near the ground truth is dominated by the FIM. Since the FIM has an outer-product structure closely resembling 36 , it is natural to expect that column subset selection strategies can provide useful guidance for data selection in PDE-based inverse problems.
From 35 , the Hessian is expressed as an outer product of \(\mathbf{A}\), with rows \(\require{physics} \mathbf{A}_{\theta,:}=\frac{\dd}{\dd \pmb{\sigma}}\mathrm{pde}(\theta,\pmb{\sigma}) \in \mathbb{R}^N\). Applying the column selection framework in 36 corresponds to fixing a set of \(\theta\) samples and restricting \(\pmb{\sigma}\). This is the opposite setting from Section 4.2.1: Here, the experiments are already collected, and the task is to identify subsets of parameters \(\pmb{\sigma}\) that can be reconstructed in a well-conditioned manner. This strategy corresponds to random sampling in parameter space.
Direct application of prior results 37 –38 is obstructed by the fact that the columns of \(\mathbf{A}\) are not normalized in our application. Nevertheless, by adapting proof techniques from [105], [120], the work [121] generalizes the guarantees to PDE settings and establishes that \[\label{eqn:32column32sample} n \leq \min \left( o\left(\frac{\text{Tr} ({\boldsymbol{H}})}{\| {\boldsymbol{H}}\|_2} \right), o\left(\frac{1}{\mu^2\, \log c} \frac{\text{Tr}({\boldsymbol{H}})^2}{\| {\boldsymbol{H}}\|_F^2} \right) \right),\tag{39}\] with the probabilistic conditioning bound \[\label{eqn:32condition32concentration} \mathbb{P} \left(\text{cond} ({\boldsymbol{H}_S}) \leq \frac{L + \tau(n)}{\ell - \tau(n) }\right) \geq 1 - \frac{1}{c}\,.\tag{40}\] Here the distortion quantity is given by \(\tau(n) = e^{\frac{1}{4}} \left(2n \frac{\|{\boldsymbol{H}} \|_2}{\text{Tr} ({\boldsymbol{H}})} + 12 \mu \sqrt{n \log c}\, \frac{\| {\boldsymbol{H}}\|_F}{\text{Tr}({\boldsymbol{H}})}\right)\), and the normalization constants account for extreme column norms: \[\ell: = \frac{N}{\text{Tr} ({\boldsymbol{H}})} \min_{i=1\dots N} \| \mathbf{A}_{:,i}\|_2^2, \quad L: = \frac{N}{\text{Tr} ({\boldsymbol{H}})} \max_{i=1\dots N} \| \mathbf{A}_{:,i}\|_2^2\,.\] The coherence parameter must also be adjusted to account for non-uniform column magnitudes: \(\mu: = \frac{N}{\| {\boldsymbol{H}}\|_F} \max_{i \neq j = 1\dots N} \vert \langle \mathbf{A}_{:,i}, \mathbf{A}_{:,j} \rangle \vert\).
The second term in 39 highlights the role of coherence: if two parameter directions \(\pmb{\sigma}_1,\pmb{\sigma}_2\) yield nearly parallel sensitivity vectors \(\require{physics} \tfrac{\dd}{\dd \pmb{\sigma}}\mathrm{pde}(:,\pmb{\sigma}_1)\) and \(\require{physics} \tfrac{\dd}{\dd \pmb{\sigma}}\mathrm{pde}(:,\pmb{\sigma}_2)\), the effective number of reliably reconstructible parameters decreases.
Numerical experiments validate these results. In Fig.8, \(\mathbf{A}\) is generated from the linearized EIT model [itm: E2] with \(c=384\). As \(n\) increases, the condition number of \({\boldsymbol{H}_\mathbf{S}}\) grows consistently with the concentration inequality 40 . For reference, the baseline threshold \(\tfrac{L}{\ell}\) is also shown (dashed line).
Figure 8: Column subset selection in PDE-based inverse problems..
To conclude, understanding unique reconstruction and well-posedness from limited data information is a key challenge in PDE inverse problems. Random sampling in the parameter space can be used to build a probabilistic framework and discern the number of parameters identifiable in a well-conditioned manner. Open research avenues for future study include: 1. Incorporating greedy algorithm and importance sampling strategy to maximize the number of parameters recovered; 2. Extending the analysis to the infinite-dimensional PDE setting with full parameter space \(\Sigma = \lim_{N \to \infty}\Sigma_N\), motivated by the concentration results 39 40 being independent from the ambient parameter dimension \(N\).
In singular value decomposition (SVD), another classical task in numerical linear algebra, we start with a rectangular matrix \(\mathbf{C}\in\mathbb{R}^{m\times n}\) of rank \(d \le \min(m,n)\), and find matrices \(\mathbf{U}\), \(\mathbf{S}\), \(\mathbf{V}\) such that \[\mathbf{C}=\mathbf{U}\mathbf{S}\mathbf{V}^\top = \sum_is_i\mathbf{U}_{:,i}\left(\mathbf{V}_{:,i}\right)^\top\,,\] where \(\mathbf{S}=\text{diag}\{s_i\}\) collects singular values that is ordered in a descending manner, while \(\mathbf{U}\in\mathbb{R}^{m\times d}\) and \(\mathbf{V}\in\mathbb{R}^{n\times d}\) have orthonormal columns, which are the left and right singular vectors or \(\mathbf{C}\), respectively, Classically, computation of the SVD starts with a bidiagonalization via Householder transformations, which transforms \(\mathbf{C}\) into a matrix that has two diagonal components. The total cost is \(O(\max(m,n)\cdot\min(m,n)^2)\).
Randomized SVD (RSVD) returns estimates to \(\mathbf{U}\), \(\mathbf{V}\) and \(\{s_i\}\) with high accuracy and probability. As thoroughly analyzed in the foundational works [103], [122], [123], it is a much more cost-efficient algorithm than classical SVD for large, dense matrices, and is highly parallelizable and memory-efficient. The total cost is about \(O(mnd)\), and in comparison with classical solvers, the saving is most pronounced when the large matrix \(\mathbf{C}\) is known to be of low rank, that is, \(d\ll\min\{m,n\}\). As a consequence, RSVD finds applications in domains that traditionally rely on Principal Component Analysis (PCA) [124], such as bioinformatics [125] and image processing [126].
RSVD has been successfully deployed for experimental design purposes for PDE-based inverse problems as well, especially through the optimal design angle. Recalling 15 , the task at hand it to find the weights \(W=\text{diag}\{w_\theta\}\) so that the induced variance matrix \(\hat{\Gamma}_W\) is small, either in terms of largest eigenvalue (\(\Phi_{E}\)), or the summation of all eigenvalues (\(\Phi_A\)) or the multiplication (\(\Phi_D\)), all of which require computation of eigenvalues. Considering the size of \(\hat{\Gamma}_W\), RSVD that returns all eigenvalues efficiently becomes very useful. This perspective was adopted in [6], [127]–[131].
Compressed sensing (CS) is a signal processing technique that leverages randomized numerical linear algebra ideas, and it has promising applications in PDE-based inverse problems. As its name suggests, CS acquires information about a signal in a compressed manner. When the object to be reconstructed lies in a high-dimensional ambient space \(\mathbb{R}^N\) but is known to be sparse (that is, supported on only a small fraction of the coordinates), the sparsity can be exploited to drastically reduce the number of measurements required. In particular, if the sensing mechanism satisfies certain structural properties, the number of “sensors” scales with the intrinsic information content \(n\) rather than with the ambient dimension \(N\).
A standard reconstruction method takes the form of an optimization problem: \[\min\|\mathbf{x}\|_1\,,\quad\text{such that}\quad \mathbf{P}_{\Omega}\mathbf{U}\cdot\mathbf{x} = \mathbf{P}_\Omega\mathbf{y}\,.\] Here \(\mathbf{U}\) is an orthonormal sensing matrix collecting all possible measurements, and \(\mathbf{P}_\Omega\) is a projection operator selecting entries according to the mask \(\Omega\) of size \(|\Omega|=c\). Because \(\mathbf{x}\) is assumed sparse, the \(\ell_1\) norm promotes sparsity in the recovered solution. Although the system is highly underdetermined, posing \(c \ll N\) constraints for \(N\)-entry of reconstruction, exact recovery is still possible when \(\mathbf{U}\) is incoherent with the coordinate basis. Classical results show that if \(c \gtrsim n\log N\), then exact recovery of \(\mathbf{x}\) is achievable with high probability [132]–[134].
These foundational results, originally developed in finite-dimensional linear algebra, extend naturally to infinite-dimensional settings that arise in PDE-based inverse problems. In this context, the to-be-reconstructed object is a function \(f(x)\) rather than a finite-dimensional vector, and the sensing matrix becomes a continuous operator \(\mathcal{U}\). Nevertheless, if \(f(x)\) is sparse in a basis system \(\mathcal{D}=\{\phi_i\}\) and if \(\mathcal{D}\) and \(\mathcal{U}\) are mutually incoherent, compressed sensing theory ensures stable recovery even from finitely many measurements [135], [136].
Matrix completion seeks to complete a dense matrix \(\mathbf{A}\in\mathbb{R}^{C\times C}\), given certain entries \(\mathbf{A}_{ij}\) for \((i,j)\in\Omega\), a sparse mask. The task is applicable to recommendation system [137] and genomics [138]. The problem itself does not necessarily belong to RNLA, but some solvers use random matrices and the associated techniques. In its generic form, there is no hope to reconstruct a matrix with a few entries, but if certain assumptions are satisfied, the reconstruction problem can be solved. Notably, when the matrix is known a-priori to be of low rank and its column and row space known to be decoherent, the following optimization problem returns an optimizer that is known to recover \(\mathbf{A}\) with high probability[139]–[141]. \[\label{eqn:32nuclear32norm32min} \begin{array}{lll} \text{minimize} &\| \mathbf{X}\|_{\ast} & \\ \text{subject~to~}& \mathbf{X}_{ij} = \mathbf{A}_{ij}, & \forall~(i,j) \in \Omega\,. \end{array}\tag{41}\] where \(\Omega\) is a randomly sampled mask with each entry following the Bernoulli distribution at a low sampling rate, and \(\|\cdot\|_*\) is the nuclear norm (the sum of all singular values). Intuitively, the objective function is the \(L_1\) norm version of singular values, in analogy to compressed sensing, where in both context it serves as a proxy for the \(L_0\) norm. The minimization of \(L_0\) norm means one looks for a reconstruction that has smallest rank and agrees with the given data. A crude conclusion suggests a sampling complexity of \(c\approx Cr\) where \(r\) is the approximate rank.
It is easy to imagine these completion solvers being useful for qualitative experimental design associated with inverse problems. An inverse problem is to use ItO data pairs to reconstruct unknown parameters. Restricted by sampling and experiment budget, one does not have the full set of input-to-output data pairs, but rather a subset of entries. Can one use this subset to reconstruct the full data set, which is then feed into the true reconstruction?
The approach is taken in [142] for the EIT problem, [itm: E2] from section 2.5, for which the ItO map is the Dirichlet-to-Neumann data. It is important to note that a well-posed inverse problem typically corresponds to a full-rank data matrix that stores the full ItO map, which prevents immediate application of matrix completion method 41 . Indeed, in [142], recognizing the off-diagonally low rank feature of the DtN map, a dyadic decomposition on the diagonal blocks can be used to peel off the off-diagonal blocks as the partition level increases [143], yielding a block structure as displayed in illustration 9.
Figure 9: Hierarchical matrix block structure on varying partition levels and rank structure of the \(\mathcal{H}\)-matrix: the diagonal full-rank blocks (orange) vs. blue low-rank off diagonal blocks..
Then matrix completion 41 is applied to each off-diagonal block. The recovered full dataset is then deployed in PDE-constrained optimization for the execution of the inverse problem.
This review has concentrated on the role of randomized numerical linear algebra (RNLA) in data selection for PDE-based inverse problems. While powerful and increasingly influential, our focus is necessarily narrow. At its core, RNLA is a linear framework whereas data selection is intrinsically nonlinear. As such, relying exclusively on RNLA overlooks essential aspects of the problem. Even within the linear regime, alternative toolboxes from modern machine learning and applied mathematics offer complementary strategies that may also prove valuable. In what follows, we highlight two such directions and discuss how they may broaden the scope of data selection methodologies.
The primary limitation of RNLA-based approaches is that they operate strictly within linear spaces, while experimental design is fundamentally nonlinear. As discussed in Section 3.3, the information value of an experiment depends strongly on the underlying medium \(\pmb{\sigma}^*\). This dependence is evident in Figure 6, where the weights assigned to measurements vary dramatically with different \(\pmb{\sigma}^*\). A natural measure of data value is the sensitivity of the reconstruction to the data, often encoded in the Jacobian. In the RNLA framework, the Jacobian is treated as a fixed matrix, making most strategies effectively passive: The algorithm is prescribed once, and the user simply waits for the output. In the nonlinear regime, however, the Jacobian itself evolves as reconstruction improves, shifting the sensitivity landscape and altering the relative value of new data.
This observation motivates interactive or adaptive data selection: Begin with an initial dataset, perform a reconstruction, and then collect additional data informed by the current state of knowledge, repeating the process as needed.
Several concrete strategies capture this philosophy. One is to pose the problem as a bilevel optimization: The inner layer solves the reconstruction problem for a given dataset, while the outer layer selects the dataset itself [144], [145]. Another approach is greedy data acquisition [52], [130], [146], [147], in which data are acquired sequentially, each step exploiting information from previous ones, as described by 10. More sophisticated approaches additionally allow for coordination of subsequent experiments and pose the problem in the reinforcement learning regime [148], [149].
Crucially, this challenge is not unique to PDE-based inverse problems. Similar difficulties arise across machine learning tasks constrained by costly data acquisition, where active learning has emerged as a central paradigm. We expect that cross-fertilization between active learning and nonlinear experimental design for PDEs will play an increasingly important role in the coming years [150].
Even within the linear regime, RNLA methods ultimately reduce large but finite linear systems via sampling or sketching. The design space is therefore still finite, no matter how large it is taken. This discreteness is still incompatible with PDE-based inverse problems, where the design space is continuously indexed and thus infinite-dimensional.
The importance of continuous indexing is illustrated in Fig. 11, which simulates radiative transfer [itm: E4]. For each experiment (each column), only a handful of detector locations record nontrivial light intensity. The relevant information thus arises in a measure-zero subset of the design space. Any fixed discretization, no matter how fine, risks missing these measure-zero structures and therefore producing a poor design space for data selection.
Analogous phenomena arise in the Darcy flow example [itm: E2] and the Lorenz-63 problem [itm: E1].
Figure 12: Left panel: Optimal distribution over the continuous design space for the Darcy flow problem, showing extreme concentration of the data importance. Right panel: Important observation times in colored slots for parameter inference in the Lorenz-63 model. Different colors indicate different measured components \((x,y,z)\)..
In such cases, methods that directly operate in infinite-dimensional spaces become essential. Traditionally, such an approach was regarded as impossible, since most optimization solvers operate in \(\mathbb{R}^d\) and exhibit complexity growing at least algebraically with \(d\), rendering them infeasible in the infinite-dimensional limit. Recent advances have changed this landscape. Tools from optimal transport and gradient flows [151]–[153] have clarified the structure of optimization over probability measure spaces, which are intrinsically infinite-dimensional. Techniques such as the mean-field limit and propagation of chaos [154], [155] further enable practical computation by translating infinite-dimensional PDEs into coupled ODE systems. These ideas, widely used in machine learning [156]–[160], have only recently been adapted to experimental design problems [161].
One such approach replaces discrete weights \(w(\theta)\) in the classical optimal design problem 15 with a probability measure \(\rho(\theta)\): \[\begin{align} \text{A-optimal}: &~ \Phi_A[\rho] = \mathrm{tr}\!\left( \left( \mathbf{A}^\top\mathbf{A}[\rho] \right)^{-1}\right), \\ \text{D-optimal}: &~ \Phi_D[\rho] = \det\!\left( \mathbf{A}^\top\mathbf{A}[\rho]\right), \end{align} \label{eqn:oed95criteria2}\tag{42}\] where \[\require{physics} \mathbf{A}^\top\mathbf{A}[\rho] = \int_{\Omega} \mathbf{A}(\theta,:)^\top \mathbf{A}(\theta,:) \,\dd\rho(\theta).\]
Optimization is then conducted directly over the probability measure space, bypassing the limitations of finite discretization.
Taken together, these directions highlight a broader landscape of experimental design beyond RNLA. In addition to embracing nonlinearity through adaptive and interactive strategies that tightly couple experimental design with reconstruction, and advancing toward infinite-dimensional formulations via tools from optimal transport, gradient flows, and mean-field theory, other approaches are also expected to play an important role. Vice versa, experimental design methods have successfully been customized to several machine learning tasks [162]–[164], prompting towards a more efficient use of data and training capacities. We anticipate that progress along these lines—at the interface of applied mathematics, machine learning, and computational science—will significantly broaden the scope and amplify the impact of experimental design methodologies in the coming decade.
Department of Computing + Mathematical Sciences, Californian Institute of Technology, Pasadena, CA↩︎
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD↩︎
Department of Mathematics and Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI↩︎
Department of Computer Science, University of Wisconsin-Madison, Madison, WI↩︎