Multidimensional deconvolution with shared bases

Daria Sushnikova, Matteo Ravasi and David Keyes
Extreme Computing Research Center
King Abdullah University of Science and Technology


Abstract

We address the estimation of seismic wavefields by means of Multidimensional Deconvolution (MDD) for various redatuming applications. While offering more accuracy than conventional correlation-based redatuming methods, MDD faces challenges due to the ill-posed nature of the underlying inverse problem and the requirement to handle large, dense, complex-valued matrices. These obstacles have long limited the adoption of MDD in the geophysical community. Recent interest in this technology has spurred the development of new strategies to enhance the robustness of the inversion process and reduce its computational overhead. We present a novel approach that extends the concept of block low-rank approximations, usually applied to linear operators, to simultaneously compress the operator, right-hand side, and unknowns. This technique greatly alleviates the data-heavy nature of MDD. Moreover, since in 3d applications the matrices do not lend themselves to global low rank approximations, we introduce a novel \(\mathcal{H}^{2}\) -like approximation. We aim to streamline MDD implementations, fostering efficiency and controlling accuracy in wavefield reconstruction. This innovation holds potential for broader applications in the geophysical domain, possibly revolutionizing the analysis of multi-dimensional seismic datasets.

April 1, 2024

1 Introduction↩︎

Seismic recordings provide crucial insights into the interaction of waves with an otherwise unknown physical medium. A so-called wave state (defined by the wave nature, medium parameters, source type, and boundary conditions) can be associated with any such recording [1]. While, in theory, one has the flexibility to choose any combination of such conditions, real-world seismic exploration faces constraints stemming from physical, economical, and environmental factors.

Several data-driven approaches have been developed under the framework of seismic interferometry (e.g., [2]) to create ideal settings, also known as virtual surveys, from physically acquired data. By defining interaction quantities and employing reciprocity theorems [1], these approaches can leverage wavefield measurements from distinct receivers to emulate responses as if a source was placed in one of the receivers’ locations. Ideally, cross-correlating the wavefield recorded by two receivers surrounded by a closed boundary of sources in a lossless medium yields the Green’s function between such receivers. This principle has been extensively used in seismic applications for a variety of tasks [3][8].

However, closed boundaries are rarely available in real-life applications. This, combined with the inevitable presence of losses in the propagating medium, can hinder accurate reconstruction, yielding a blurred Green’s function from cross-correlation interferometry [9]. An alternative approach that can accommodate both dissipative media and open boundaries relies on a convolution-type reciprocity theorem.  [9] showed that this leads to an implicit Green’s function representation; deconvolution can then be employed to retrieve such Green’s function mitigating the aforementioned correlation-induced, spatio-temporal blurring effects. This method, known as seismic interferometry by a Multidimensional Deconvolution (MDD) has been initially proposed in the acoustic setting and later extended to more general vector fields, including elastic [10] and electromagnetic waves [11].

While the adoption of MDD constitutes a significant leap towards achieving more accurate redatuming compared to traditional correlation methods, it comes with numerical and computational challenges. The former stems from the ill-posed nature of the inverse problem that one has to solve to estimate the sought after Green’s function, whilst the latter is related to the arithmetic and storage complexity of the problem. MDD algorithms must in fact manipulate dense matrices for the operator, unknowns, and right-hand side, which significantly escalates the computational demands as the problem size grows, especially in 3d applications. This has resulted in a somewhat constrained uptake of MDD within the geophysical community.

However, the superior capabilities of MDD have led the geophysical community to identify strategies aimed at bolstering inversion stability and alleviating computational overhead. For instance, to enhance stability, researchers in [12] apply singular-value decomposition (SVD) to compute the pseudoinverse matrix and truncate small eigenvalues as a way to naturally regularize the inverse process. More recently,  [13] proposed to introduce a regularization term to enforce solution similarity among neighboring sources and/or receivers.

Alternatively, a time-domain formulation of MDD was proposed in [14]: by operating in the time domain, one can more naturally regularize the inverse problem, alongside parametrizing the solution in a suitable transformed domain (e.g., curvelets) where sparsity-promoting inversion can be employed. Along similar lines, [15] and [16] proposed to introduce a preconditioner to enforce reciprocity in the sought after Green’s function. Continuing, [17] suggested to combine three physics-based preconditioners, namely causality, reciprocity, and frequency–wavenumber locality within the same time-domain formulation. This resulted in improved stability also for scenarios with highly complex propagating overburdens (e.g., salt bodies). Finally, low-rank regularization has been recently proposed by [18][20] as an alternative way to parametrize the sought after solution. Whilst primarily focused on tackling the first challenge (i.e., stability), this approach can also reduce the size of the solution when a factorization-based form of low-rankness is considered.

From a computational standpoint, two main approaches have been investigated to date to reduce the cost of applying MDD. The former leverages the fact that the MDD equations can be formulated as a finite-sum functional [21]; as such, the associated inverse problem can be solved by means of stochastic gradient descent algorithms, where the gradient at each iteration is computed using a small subset of randomly selected sources. This can reduce the overhead of having to access the entire dataset at each iteration and has been shown to lead to faster and more robust convergence. The latter involves the application of block low-rank approximation techniques, such as the Tile-Low Rank (TLR) algorithm [22][26], to the operator matrix. By doing so, one can first reduce the size of the operator, and later accelerate matrix-vector multiplications to performing this operation in two steps where the factors, instead of the full matrix, are utilized.

Our paper extends the concept of block low-rank approximation to encompass compression of both the operator, itself, and the right-hand side and unknown matrix. This is particularly advantageous given the data-heavy nature of the right-hand side and the unknowns in MDD. Furthermore, by embracing a more sophisticated \(\mathcal{H}^{2}\)  [27], [28] approximation of the dense matrices, we show that further compressibility of the wavefield can be achieved compared to the TLR approach. Through this multifaceted approach, we endeavor to pave the way for more efficient MDD implementations, ultimately expanding its potential applications within the geophysical domain.

2 Multi-dimensional deconvolution↩︎

Let us consider a medium composed of two parts: the first one, usually referred to as overburden, lies in between the surface of the Earth and the receiver datum; the second one, below the receiver datum, represents the target area we are ultimately interested to image. One or more sources, denoted here as \(x_s\), are placed anywhere in the overburden, whilst receivers \(x_0'\) and virtual sources \(x_0\) are located at the datum that separates the overburden from the target area. We assume that the wavefields \(w^{-}\) from \(x_s\) to \(x_0\) and \(w^{+}\) from \(x_s\) to \(x_0'\) are known (i.e., physically recorded or obtained by a pre-processing step). In this problem, we are interested to recover the wavefield \(r\) that travels only below the receiver datum. Figure 1 shows a schematic representation of the acquisition geometry associated with the MDD problem.

Figure 1: Visual representation of the MDD algorithm.

Following [29], the aforementioned wavefields are related to each other via a convolution-type representation theorem,

\[\label{eq:mdccont} w^{-} (x_{s}, x_{0}, t)= \mathcal{F}_t^{-1} \int w^{+}(x_s, x'_0, f)\mathcal{F}_t \left(r(x'_0, x_0, t)\right) dx'_0,\tag{1}\] where \(\mathcal{F}_t\) and \(\mathcal{F}_t^{-1}\) represent the forward and inversion Fourier transforms, whilst \(t\) and \(f\) refer to the time and frequency axes. In order to find a numerical solution of equation (1 ) for the wavefield of interest \(r\), both space and time are usually discretized, leading to the following expression

\[\label{eq:mdc} w^{-} (x_{s,i}, x_{0,j}, t_k)= F_t^{-1} \left( \sum_{\alpha=1}^{N_0} w^{+}(x_{s,i}, x'_{0,\alpha}, f_p) F_t (r(x'_{0,\alpha}, x_{0,j}, t_k))\right),\tag{2}\] where \(F_t\) and \(F_t^{-1}\) are the discrete forward and inversion Fourier transforms, \(i=1\dots N_s\), \(j=1,\dots, N_0\), \(\alpha=1,\dots, N_0\), \(k=1,\dots, N_t\), and \(p=1,\dots, N_f\), with \(N_s\), \(N_0\), \(N_t\), \(N_f\) representing the number of sources, receivers, time samples, and frequency samples, respectively.

The linear system in equation (2 ) can be reformulated in a tensor form if we assume that \[B_t[i, j, k] = w^{-}(x_{s, i}, x_{0, j}, t_k)\] \[W[i,\alpha,p] = W^{+}(x_{s, i}, x'_{0,\alpha}, f_p)\] \[X_t[\alpha,j,k] = r(x'_{0,\alpha}, x_{0, j}, t_k)\] yielding \[F^H W F X_t = B_t,\] where \(F\) is a Discrete Fourier transform operator applied over the time index, \(W\in\mathbb{R}^{N_s\times N_0\times N_f}, X_t\in\mathbb{R}^{N_0\times N_0\times N_t}, B_t\in\mathbb{R}^{N_s\times N_0\times N_t}\).

By applying the Fourier transform to the system from right and left over the time index, we obtain an equivalent system of equations in the frequency domain:

\[W X_f = B_f,\] where \(X_f\in\mathbb{R}^{N_0\times N_0\times N_f}\) and \(B_f\in\mathbb{R}^{N_s\times N_0\times N_f}\). This tensor system is equivalent to a series of linear systems for different frequencies: \[W_p X_p = B_p, \quad p=1\dots N_f, \label{eq:fr95sys}\tag{3}\] where \(W_p = W[:,:, p]\), \(X_p = X_f[:,:, p]\), and \(B_p = B_f[:,:, p]\).

Figure 2: image.

The three matrices involved in equation 3 are usually dense and have comparable sizes. In order to reduce the cost of solving these \(N_f\) systems of equations, one must find a suitable compressed representation of such matrices.

3 Compression of the operator, right-hand side, and unknowns↩︎

In this section, we introduce several approaches that benefit from the global or block-wise low-rank nature of the matrices in the system of equations (3 ). These approaches span from the simplest low-rank approximation to more advanced techniques based on the \(\mathcal{H}^{2}\) matrix approximation.

3.1 Global low-rank parametrization↩︎

To begin, let us examine the scenario where all the matrices in equation (3 ) possess the property of being globally low rank.

3.1.1 Approximation based on the non-symmetric low-rank parametrization of \(X_p\)↩︎

Since we expect the solution \(X_p\) to be globally low rank, let us write \(X_p\) in this factorized form: \[X_p = L_pR_p,\] where \(X_p\in\mathbb{R}^{N_0 \times N_0}\), \(L_p\in\mathbb{R}^{N_0\times r}\), \(R_p\in\mathbb{R}^{r\times N_0}\), \(r\ll N_0\). We obtain the system: \[W_pL_pR_p = B_p. \label{eq:xlr1}\tag{4}\]

One obvious option for solving this system of equations is represented by the alternating least squares method (ALS [30], [31]), where the factors \(L_p\) and \(R_p\) are updated in an alternating fashion. This method has been shown to have poor convergence unless additional regularization terms are imposed on the factors (see, for example, [19]). Instead, we decide here to compute also a low-rank approximation for the matrix \(B_p\):

\[B_p = \widehat{L}_p\widehat{R}_p,\] where \(B_p\in\mathbb{R}^{N_s \times N_0}\), \(\widehat{L}_p\in\mathbb{R}^{N_s\times r}\), \(\widehat{R}_p\in\mathbb{R}^{r\times N_0}\), \(r\ll N_0, N_s\). We obtain the system: \[W_pL_pR_p = \widehat{L}_p\widehat{R}_p. \label{tfybzsiu}\tag{5}\]

Figure 3: image.

We then assume that \[R_p = \widehat{R}_p. \label{eq:xlr2}\tag{6}\]

Such an assumption is based on the intuition that the solution and right-hand side are wavefields with similar features and therefore can be represented by a shared basis. The experiments in sections 4.1 and 4.2 confirm the validity of this assumption as satisfactory solutions are produced in both scenarios. Limitations and benefits of such an assumption are a subject of future research.

Using the assumption in equation (6 ), equation (4 ) becomes:

\[W_pL_p\widehat{R}_p = \widehat{L}_p\widehat{R}_p. \label{eq:xlr3}\tag{7}\] Multiplying both sides of equation (7 )by the pseudoinverse of \(\widehat{R}_p\) (i.e., \(\widehat{R}_p^{\dagger}\)) we obtain:

\[W_pL_p = \widehat{L}_p \label{eq:xlr4}\tag{8}\]

Figure 4: image.

We can now solve this system for \(L_p\) and ultimately reconstruct our solution of interest as \(X_p = L_p\widehat{R}_p\). By choosing this parametrization, we have reduced the size of the problem (and therefore the number of computations required to obtain the solution) by a factor of \(\frac{N_0}{r}\). We refer to this technique as the USV method and as USV-rsp when a reciprocity preconditioner is added in the solution of equation 8 .

However, the proposed approach presents several downsides. Notably, it does not exploit the fact that also \(W_p\) may be a low-rank matrix. Moreover, the chosen parametrization for the solution (i.e., \(X_p = L_p \widehat{R}_p\)) ignores the fact that this matrix should be symmetric, owing to the reciprocity of acoustic wavefields.

3.1.2 Approximation based on the symmetric low-rank parametrization of \(X_p\).↩︎

Let us now define the truncated SVD of the matrix \(B_p\) as:

\[B_p = U_p\Sigma_pV_p,\] where \(B_p\in\mathbb{R}^{N_s \times N_0}\), \(U_p\in\mathbb{R}^{N_s\times r}\), \(\Sigma_p\in\mathbb{R}^{r\times r}\), \(V_p\in\mathbb{R}^{r\times N_0}\), \(r\ll N_0, N_s.\)

Given that we want i) \(X_p\) to be symmetric (but not Hermitian), and ii) the matrices \(B_p\) and \(X_p\) to share their right factor, we write \(X_p\) in the following format: \[X_p = V_p^{\top}S_p V_p,\] where \(S_p\in\mathbb{R}^{r\times r}\) is an unknown matrix. We obtain the system:

\[W_p V^{\top}_p S_p V_p = U_p\Sigma_pV_p. \label{eq:vsv1}\tag{9}\]

Figure 5: image.

Since \(V_p\) and \(U_p\) are unitary matrices (i.e., \(V_p V_p^{^{\mathsf{H}}}=I_r\) and \(U_p^{^{\mathsf{H}}} U_p=I_r\)), multiplying the system (9 ) by the matrix \(V_p^{^{\mathsf{H}}}\) from the right and with the matrix \(U_p^{^{\mathsf{H}}}\) from the left leads to

\[\left (U_p^{^{\mathsf{H}}} W_p V_p^{\top} \right) S_p = \Sigma_p, \label{eq:vsv2}\tag{10}\] where all three matrices \(\left (U_p^{^{\mathsf{H}}} W_p V_p^{\top} \right), S_p, \Sigma_p \in \mathbb{R}^{r\times r}\).

Figure 6: image.

We can now solve this system for \(S_p\) and ultimately reconstruct our solution of interest as \(X_p = V_p^{\top}S_pV_p\) and we refer to this technique as the LR method. Similar to the previous section, we have here made the assumption that the operator, right-hand side, and unknowns share the bases \(V_p\) and \(U_p\). For the numerical problems in Sections 4.1 and 4.2, this assumption is validated empirically. A theoretical analysis of the limitations of such an assumption is the subject of future research.

To summarize, we observe that the parametrization \(X_p = V_p^{\top}S_p V_p\) can dramatically reduce the size of the solution (i.e., from inverting a matrix of size \(N_0\times N_0\) to inverting a matrix of size \(r\times r\)) and, thus, the computational complexity of the associated inverse problem. Impressively, this reduction in complexity does not incur a significant compromise in the solution’s quality. Furthermore, this approach empowers us to benefit from the symmetry of the solution \(X_p\), without requiring the introduction of any reciprocity preconditioner in the solution of the inverse problem [17].

3.2 \(\mathcal{H}^{2}\) approximation↩︎

Unfortunately, the matrices \(W_p\), \(X_p\), \(B_p\) are usually not globally low-rank in 3-dimensional problems. However, on the bright side, those matrices can be well approximated using block low-rank structures. Particularly, they can be represented in a TLR structure [22] as well as \(\mathcal{H}^{2}\) structure [27], [28], as we verify it in our numerical experiments in Sections 4.3.

In this section, we introduce the fundamental concepts of the \(\mathcal{H}^{2}\) matrix approximation. For more comprehensive and formal definitions, please refer to [27], [28], [32]. It is important to note that our version for \(\mathcal{H}^{2}\) decomposition differs from the standard one, yet it remains valid. A thorough explanation of our version of \(\mathcal{H}^{2}\) is available in [33].

Let us, for example, consider the dense right-hand side matrix \(B_p\) in the system (3 ) and build \(\mathcal{H}^{2}\) factorization of it. Noting that rows and columns of \(B_p\) are associated with sources and receivers in the physical acquisition geometry, we can start by hierarchically splitting the grids of sources and receivers and storing these divisions within tree structures. Utilizing this grid division, we rearrange the rows and columns of the matrix \(B_p\): such a rearrangement creates a block structure in the new matrix, which possesses now block low-rank properties. Let the matrix \(B_p\) have \(N_{r}\) block rows and \(N_{c}\) block columns. Thus, the matrix \(B_p\) is divided into blocks \[B_{ij}\in \mathbb{R}^{N_{i} \times N_{j}}, \quad i \in 1\dots N_{r}, j \in 1\dots N_{c},\] where \(N_{i}\) is a size of \(i\)-th block row, and \(N_{j}\) is a size of \(j\)-th block column. We also define block rows and block columns \[B_{i}\in \mathbb{R}^{N_{i} \times N_{0}},\] \[B_{j}\in \mathbb{R}^{N_{s} \times N_{j}}.\]

All blocks in matrix \(B_p\) can be separated into two classes: blocks that correspond to the sources and receivers that are located closer than some predefined distance (called ‘close blocks’) and all others (called ‘far blocks’). Let us also define the block row \(\widehat{B}_{i}\), which is \(B_i\) with zeroed close blocks, and accordingly, block column \(\widehat{B}_{j}\) is \(B_j\) with zeroed close blocks:

\[B_{ij} \in \widehat{B}_i = \begin{cases} B_{ij} \in B_i & \quad \mathbf{if~} B_{ij} \mathbf{~far} \\ 0 & \quad \mathbf{if~} B_{ij} \mathbf{~close} \end{cases}\]

\[B_{ij} \in \widehat{B}_j = \begin{cases} B_{ij} \in B_j & \quad \mathbf{if~} B_{ij} \mathbf{~far} \\ 0 & \quad \mathbf{if~} B_{ij} \mathbf{~close} \end{cases}\] See illustration of \(B_i\) and \(\widehat{B}_i\) in Figure 7.

Figure 7: Example of \(B_i\) and \(\widehat{B}_i\). Dark green blocks represent close blocks and light green blocks represent far blocks. White blocks are zeros.

The key property of the \(\mathcal{H}^{2}\) matrix formulation used in this work is that block row \(\widehat{B}_{i}\) and block column \(\widehat{B}_{j}\) matrices are globally low rank. In other words, if we consider the left factor \(U_i\in\mathbb{R}^{N_i\times N_i}\) of the SVD decomposition of the \(\widehat{B}_{i}\) matrix, and multiply the block row \(\widehat{B}_{i}\) from the left with \(U_i^{^{\mathsf{H}}}\), the resulting matrix will contain non-zero values only in the first \(r\) rows (whilst all other rows will have values close to zero), where \(r\) is rank of \(\widehat{B}_{i}\). Similarly, if we multiply the block row \(B_i\) from the left with \(U_i^{^{\mathsf{H}}}\), the resulting matrix will have the same compression for the far blocks and some changes in the close blocks. See the illustration of the block row \(U_i^{^{\mathsf{H}}} B_i\) in Figure 8.

Figure 8: Example of block row \(U_i^{^{\mathsf{H}}} B_i\).

Analogously, we can compress the block columns by computing the right SVD factor \(V_j\in\mathbb{R}^{N_j\times N_j}\) of the block columns \(\widehat{B}_j\) and multiplying \(V_j\) from the right to the block column \(B_j\).

Combining all the obtained \(U_i\) and \(V_j\) blocks into two big block-diagonal unitary matrices, obtain: \[U = \begin{bmatrix} U_{1}&&\\ &\ddots&\\ &&U_{N_b}\\ \end{bmatrix}, \quad V = \begin{bmatrix} V_{1}&&\\ &\ddots&\\ &&V_{N_b}\\ \end{bmatrix},\] where \(U\in \mathbb{R}^{N_s \times N_s}\), \(V\in \mathbb{R}^{N_0 \times N_0}.\) Note, that in practice, it is more efficient to obtain \(V_j\) factor computing SVD not of the original \(\widehat{B}_{j}\) block columns, but of the block column \(U^{^{\mathsf{H}}}B_j\) with excluded zeros.

Multiplying the the original matrix \(B_p\) from left and right with the matrices \(U^{^{\mathsf{H}}}\) and \(V^{^{\mathsf{H}}}\) we obtain a sparse matrix \(S_B\in\mathbb{R}^{N_s\times N_0}\): \[\label{eq:subv} S_B = U^{^{\mathsf{H}}} B_p V^{^{\mathsf{H}}}.\tag{11}\] More specifically, \(S_B\) is a block matrix with the following block structure: it has a dense block \(S_{ij} = U_{i}^{^{\mathsf{H}}}B_{ij}V^{^{\mathsf{H}}}_{i}\), if the corresponding block of \(B_p\) is of close type and it has zero block with non-zero \(r\times r\) upper-left corner, if the corresponding block of \(B_p\) is of far type. Note that in the actual implementation of our algorithm, we store the matrix \(S_B\) in a block-sparse format to avoid storage of the zero elements. Multiplying the Equation (11 ) by the matrices \(U\) and \(V\), obtain: \[B_p = US_BV\] See Figure 9 for an illustration of the structure of our matrix’s parametrization:

Figure 9: Illustration of \(\mathcal{H}^{2}\) structure of matrix \(B_p\).

Note that this is the one-level version of the \(\mathcal{H}^{2}\) matrix. The multi-level version is more advanced, and its application is the subject of future research.

3.3 \(\mathcal{H}^2\)-like parametrization of the MDD↩︎

In this section, to simplify our notation, we drop the \(p\) index from every matrix. From now on, we therefore assume that the equations hold true for all frequencies \(p\in\mathbb{R}^{N_f}\).

As in Section 3.1.2, we know a priori that \(X\) is symmetric (but not hermitian). We also want the matrices \(B\) and \(X\) to share the right factor. Thus, we will search \(X\) in the following format: \[X = V^{\top}S_X V,\] where \(S_X\) has the same format as \(S_B\). We obtain the system: \[WV^{\top}S_X V = US_B V.\] Then we apply the matrices \(U^{\mathsf{H}}\) to the \(W\) from the left and \(V^{\mathsf{H}}\) from the right. We obtain the \(U^{\mathsf{H}}WV^{\mathsf{H}}\) operator. Thanks to the physical properties of the problem, that operator has the same structure as \(S_B\): \[S_W = U^{\mathsf{H}}WV^{\mathsf{H}}.\] We obtain the following system: \[\label{eq:sys1} US_W\overline{V} V^{\top} S_X V = U S_B V.\tag{12}\] Let us then multiply the system (12 ) with \(U^{\mathsf{H}}\) from the left and \(V^{\mathsf{H}}\) from the right: \[U^{\mathsf{H}}US_W\overline{V} V^{\top} S_X V V^{\mathsf{H}}= U^{\mathsf{H}}U S_B VV^{\mathsf{H}}.\] Considering the fact that \(U\) and \(V\) are hermitian matrices, and thus \(U^{\mathsf{H}}U = I\), \(VV^{\mathsf{H}}= I\), \(\overline{V} V^{\top} = (VV^{\mathsf{H}})^{\top} = I^{\top}=I\), we receive: \[\label{eq:sss} S_W S_X = S_B.\tag{13}\] We can now solve this system with the iterative method LSQR [34] for \(S_X\). After computing \(S_X\), we reconstruct the solution using the formula \[X = V^{\top} S_X V.\] After that, using the Fourier transform, we obtain the final solution in the time domain.

Remark 1. While solving the system (13 ), the LSQR method employs matrix-by-matrix multiplication. If the process does not preserve sparsity, the resulting dense matrix negates the benefits of compression. The solution is to put to zero all the elements of the resulting matrix that were zero in the initial matrices. This approach is illustrated in Figure 10, which visually explains how we preserve the sparse structure during multiplication.

Figure 10: The fill-in during the matrix multiplication.

Remark 2. In the numerical implementation, matrix-by-matrix multiplication is carried out in a specific way: the right matrix is converted into a vector that contains only its nonzero elements. The left matrix is implemented as a linear operator, which is then applied to the vectorized right matrix. After applying the linear operator, the resulting vector is converted back into a sparse matrix format. This approach efficiently handles the multiplication by using only the nonzero elements. Another significant benefit of this method is its compatibility with the standard LSQR algorithm, which naturally operates with vectors and linear operators. This compatibility makes integrating the LSQR method with this approach easier.

This approach empowers us to benefit from the symmetry of the solution \(X\), enabling, for example, the utilization of the reciprocity precondition. We refer to this technique as the \(\mathcal{H}^{2}\) method.

4 Numerical Examples↩︎

In this section, we present three numerical experiments to validate the proposed methods; these involve applying global low-rank (for the 2-dimensional case) and block low-rank (for the 3-dimensional case) compression techniques to all elements of the linear system we want to solve. The different experiments are chosen to mirror real-world seismic exploration scenarios. Among the datasets, two are two-dimensional, offering standard complexity scenarios, while the third is a significantly larger three-dimensional dataset, providing a more challenging and comprehensive test of the capabilities of our approaches in handling large-scale seismic data. All algorithms are implemented in Python; our focus is to characterize the time and memory savings of our methods in return for a controllable compromise in the quality of the results.

4.1 Ocean-bottom cable (2d)↩︎

In this subsection, we consider a synthetic ocean-bottom cable (OBC) dataset. In our specific case study, we use a two-dimensional segment of the SEG/EAGE Overthrust model [35], which includes an additional water layer on top. 401 receivers are arranged on the sea floor. Each is spaced 10 meters apart and 292 meters below the ocean’s surface. 401 sources are placed at a depth of 20 m with a lateral spacing of 10 m.

Figure 11 illustrates the inversion results achieved using MDD, using the LR, USV, USV-rsp (USV with reciprocity precondition), and \(\mathcal{H}^{2}\) compression strategies described in the paper. These results are further compared to the reference response directly modeled via finite difference without the water column and free surface and to the "full" solution where we solve MDD in the time or frequency domain with uncompressed matrices.

a

Modeled solution

b

Full, time dom.

c

LR method

d

USV method

e

USV-rsp method

f

\(\mathcal{H}^{2}\) method

Figure 11: MDD inversion results for different solution methods..

The LR method outperforms the USV and USV-rsp methods, using a significantly larger amount of data in the solution process. In this context, applying reciprocity preconditioning has minimal impact on the outcome. In the 2D scenario, the \(\mathcal{H}^{2}\) method appears to be the least effective. Its complex data structure necessitates large computational time, and the additional approximations result in a visible accuracy loss. However, in 3D, the large size of the matrices involved makes this format advantageous, offsetting the initial drawbacks with its benefits, as we will show later.

Table 1 compares the time and memory requirements for the MDD methods described above. We consider the uncompressed time and frequency domain MDD, LR, USV, and \(\mathcal{H}^{2}\) methods. For timing, we present integral solution timings, including the approximation to the LR, USV, or \(\mathcal{H}^{2}\) format. For memory, we sum up the sizes of all matrices involved in the MDD process (operator, right-hand side, and unknown matrices) in the frequency domain.

Below, we present the formulas for the memory requirements. For this 2d ocean-bottom cable dataset, the sizes of the arrays are: number of sources \(N_s = 201\), number of receivers and virtual receivers \(N_0 = 201\), number of time samples \(N_t = 1150\), and number of frequency samples \(N_f = 200\), rank is set to \(r=50\).

The full-matrix right-hand side data in the time domain has a size of \[\label{eq:mem95rhs95t} N_s\times N_0 \times N_t \times 4 \text{Bytes} = 185.8\text{MB},\tag{14}\] whilst the size of the full-matrix right-hand side data in the frequency domain is \[\label{eq:mem95rhs95f} N_s\times N_0\times N_f \times 8 \text{Bytes} = 64.6\text{MB},\tag{15}\] the size of the LR right-hand side data is \[\label{eq:mem95rhs95lr} N_s\times r \times N_f \times 8 \text{Bytes} = 16.1\text{MB},\tag{16}\] the size of the USV right-hand side data is \[\label{eq:mem95rhs95usv} r \times r \times N_f \times 8 \text{Bytes} = 4.0\text{MB},\tag{17}\] and the size of the \(\mathcal{H}^{2}\) right-hand side data is computed empirically as \(13.5\text{MB}\).

The full-matrix unknowns in the time domain have a size of \[\label{eq:mem95x95t} N_0\times N_0\times N_t \times 4 \text{Bytes} = 185.8\text{MB},\tag{18}\] whilst the size of the full-matrix unknowns in the frequency domain is \[\label{eq:mem95x95f} N_0\times N_0\times N_f \times 8 \text{Bytes} = 64.6\text{MB},\tag{19}\] the size of the LR unknowns is \[\label{eq:mem95x95lr} N_0\times r\times N_f \times 8 \text{Bytes} = 16.1\text{MB},\tag{20}\] the size of the USV unknowns is \[\label{eq:mem95x95usv} r\times r\times N_f \times 8 \text{Bytes} = 4.0\text{MB},\tag{21}\] and the size of the \(\mathcal{H}^{2}\) unknowns side data is again computed empirically as \(13.5\text{MB}\).

The operator is always stored in the frequency domain. The operators have the same size as the frequency domain right-hand side of the same method for all methods except for LR, where operators have the same size as the full-matrix right-hand side.

Table 1: Time and memory comparison of the candidate methods.
full, time dom. full, freq. dom LR USV \(\mathcal{H}^{2}\) 
Time (s) 21.3 12.8 6.0 0.71 33.8
Mem (MB) 436.3 193.9 96.8 12.0 40.4

In this dataset, all three system matrices across every frequency possess a globally low rank. This characteristic underscores the effectiveness of the USV method, which emerges as the optimal choice in terms of both memory and time efficiency.

4.2 Target level below a complex salt body (2d)↩︎

For our second experiment, we use a dataset created by redatuming surface reflection data at a target level below a complex salt body [17]. These wavefields are obtained by applying the scattering Rayleigh-Marchenko method as detailed in [36]. This method allows one to estimate wavefields traveling upwards and downwards as if they were emitted from surface sources and then captured by virtual receivers located under a complex layer of earth, like beneath a salt body; see Figure 12.

The experimental area extends over 16.26 km horizontally and reaches a depth of 8 km. It is designed to imitate a challenging environment with an uneven salt layer and other complex features like sharp changes in rock layers, uneven sediment layers, and inconsistencies between different rock layers. Redatuming in such a complex area is difficult because of the stark differences in rock properties, which can cause multiple internal reflections. These reflections can lead to misleading signals in the data, often seen as artifacts or distortions. Additionally, other challenges like limited frequency range and insufficient source illumination make the redatuming process more challenging.

Figure 12: Sub-salt velocity model supporting decomposed wavefields at the receiver array. A magenta line runs horizontally at a depth of 4.4 kilometers below the layer above the salt. The sources of the wavefields, shown as a solid red line, are positioned at the surface. A dashed black line also marks the specific target area where we aim to capture and analyze the reflected wave signals.

Figure 13 presents a reference solution, and Figure 14 presents different results obtained by applying MDD inversion to the considered dataset. These results were achieved with the various methods detailed in the paper. This particular dataset benefits from having a globally low-rank structure across all three matrices in systems for all frequencies. However, despite this advantage, the problem is inherently unstable. The results are only reliable when the reciprocity preconditioner is applied, as clearly demonstrated in Figure 14. (Compare same methods with and without preconditioning.)

a

Modeled solution

b

Full, time dom

c

Full-rcp, time dom

Figure 13: MDD inversion results for reference solution..

a

LR method

b

USV method

c

USV-rsp method

d

\(\mathcal{H}^{2}\) method

e

\(\mathcal{H}^2\)-rsp method

Figure 14: MDD inversion results for different solution methods..

Table 2 compares the time and memory requirements for the MDD methods described above. We consider the uncompressed time and frequency domain MDD, LR, USV, and \(\mathcal{H}^{2}\) methods. For timing, we present integral solution timings, including the approximation to the LR, USV, or \(\mathcal{H}^{2}\) format. For memory, we sum up the sizes of all matrices involved in the MDD process (operator, right-hand side, and unknown matrices) in the frequency domain.

We compute the memory requirements using the formulas (1421 ) presented in the previous subsection. The sizes of the \(\mathcal{H}^{2}\) unknowns side data are computed empirically. In this 2d target level below a complex salt body dataset, the number of sources is \(N_s = 201\), the number of receivers and virtual receivers is \(N_0 = 151\), the number of time samples is \(N_t = 2001\), the number of frequency samples is \(N_f = 400\), and rank is set to \(r=50\).

Table 2: Time and memory comparison of the candidate methods.
full, time dom. full, freq. dom. LR USV \(\mathcal{H}^{2}\) 
Time (s) 26.5 19.3 10.8 2.1 19.5
Mem (MB) 583.0 267.2 153.4 24.0 80.7

For this dataset, the reciprocity preconditioning is crucial to obtaining valuable results. Thus, the USV-rsp method outperforms other methods not only in terms of memory and time but also in terms of solution quality. The LR method’s inability to use reciprocity becomes critical in this example. As in the previous example, the \(\mathcal{H}^{2}\) method is the least effective for the above reasons.

4.3 Ocean-bottom cable (3d)↩︎

In this subsection, we consider the synthetic ocean-bottom cable model. In this example, we use a full 3d SEG/EAGE Overthrust model [35], which includes an additional water layer on top.

The grid of \(177\times 90\) receivers is arranged on the sea floor. Each spaced 20 meters apart and 292 meters below the ocean’s surface. Additionally, we create a \(120\times 217\) grid of shot gathers with a source spacing of 20 m and a depth of 10 m.

Figure 15 presents a reference MDD solutions, and Figure 16 illustrates the visualization of inversion results achieved using frequency MDD, applied through method \(\mathcal{H}^{2}\) with various rank parameters, compared to the reference results: the modeled solution and the frequency domain full-matrices solution. This dataset, a substantial 3D one, does not exhibit a globally low rank across its three matrices at all frequencies. Consequently, the only feasible method for this dataset is the \(\mathcal{H}^{2}\) method. Our analysis of the previous two datasets shows that the \(\mathcal{H}^{2}\) method incurs significant overhead compared to the LR and USV methods. However, it remains preferable over using full matrices.

a

Modeled local reflectivity

b

Solution \(\mathbf{R_{inv}}\), computed in frequency domain, for full matrices

Figure 15: Reference solutions.

a

\(\mathbf{R_{inv}}\), \(\mathcal{H}^{2}\) , \(r=200\)

b

\(\mathbf{R_{inv}}\), \(\mathcal{H}^{2}\) , \(r=100\)

c

\(\mathbf{R_{inv}}\), \(\mathcal{H}^{2}\) , \(r=1\)

Figure 16: MDD results obtained with \(\mathcal{H}^{2}\) method for various ranks..

In Figure 17, we demonstrate a comparison between the solutions obtained through the full-matrix method and the \(\mathcal{H}^{2}\) method within the frequency domain. The frequency slice number is \(p=90\).

a

Full freq. dom.

b

\(\mathcal{H}^{2}\) , \(r=200\)

c

\(\mathcal{H}^{2}\) , \(r=100\)

d

Full freq. dom., fragment

e

\(\mathcal{H}^{2}\) , \(r=200\), fragment

f

\(\mathcal{H}^{2}\) , \(r=100\), fragment

Figure 17: MDD solution \(X_{p}\) in the frequency domain for frequency slice number \(p=90\)..

This comparison reveals that although the \(\mathcal{H}^{2}\) method may represent a compromise in terms of outright solution fidelity, it notably succeeds in maintaining all essential elements of the solution.

Table 3 compares the time and memory requirements for the MDD methods described above. We consider the uncompressed time and frequency domain MDD and \(\mathcal{H}^{2}\) method for various ranks. Given the computational limitations of fully resolving the time domain MDD on even the most advanced supercomputers, we are confined to presenting only theoretically derived memory requirements. Conversely, in the frequency domain, leveraging a state-of-the-art supercomputer equipped with parallel GPU cores has enabled us to calculate and visually present the results. However, it is important to note that comparisons of computation time may not be equitable, as the \(\mathcal{H}^{2}\) method calculations were performed on a CPU-core-based supercomputer.

For timing, we present integral solution timings, including the approximation to the \(\mathcal{H}^{2}\) format. For memory, we sum up the sizes of all matrices involved in the MDD process (operator, right-hand side, and unknown matrices) in the frequency domain. We compute the memory requirements using the formulas (1421 ). The sizes of the \(\mathcal{H}^{2}\) unknowns side data are computed empirically. In this 3d Ocean-bottom cable dataset, the number of sources \(N_s = 26040\), number of receivers and virtual receivers \(N_0 = 15930\), number of time samples \(N_t = 1126\), and number of frequency samples \(N_f = 200\).

Table 3: Time and memory comparison of the presented methods.
full, time dom. full, freq. dom. \(\mathcal{H}^{2}\) , \(r=200\) \(\mathcal{H}^{2}\) , \(r=100\) \(\mathcal{H}^{2}\) , \(r=1\)
Time (s) - - 4189 1530 350
Mem (TB) 3.85 2.2 0.46 0.11 0.017

The trade-off becomes apparent: reducing the rank compromises solution quality (as illustrated in Figures 16 and 15), but it significantly saves time and memory resources.

5 Conclusion↩︎

To advance the solution of linear systems that involve massive right-hand sides and unknowns, this paper addresses the challenge of solving systems with a data-heavy right-hand side and solution, using the example of Multidimensional Deconvolution (MDD) inversion. We introduce and compare several approaches to compress all matrices within the linear system, significantly improving the solution efficiency, including algorithms based on global low-rank (USV, LR) and block low-rank (\(\mathcal{H}^2\)).

6 Acknowledgements↩︎

Authors are thankful to the Supercomputing Laboratory and the Extreme Computing Research Center at King Abdullah University of Science and Technology for their support and computing resources.

References↩︎

References↩︎

[1]
J. T. Fokkema and P. M. van den Berg. Seismic applications of acoustic reciprocity. Elsevier, 2013.
[2]
K. Wapenaar and J. Fokkema. Green’s function representations for seismic interferometry. Geophysics, 71(4):SI33–SI46, 2006.
[3]
X. Li, T. Becker, M. Ravasi, J. Robertsson, and D.-J. van Manen. Closed-aperture unbounded acoustics experimentation using multidimensional deconvolution. The Journal of the Acoustical Society of America, 149(3):1813–1828, 2021.
[4]
E. Ruigrok, X. Campman, D. Draganov, and K. Wapenaar. High-resolution lithospheric imaging with seismic interferometry. Geophysical Journal International, 183(1):339–357, 2010.
[5]
D. Draganov, X. Campman, J. Thorbecke, A. Verdel, and K. Wapenaar. Reflection images from ambient seismic noise. Geophysics, 74(5):A63–A67, 2009.
[6]
G. T. Schuster and M. Zhou. A theoretical overview of model-based and correlation-based redatuming methods. Geophysics, 71(4):SI103–SI110, 2006.
[7]
A. Bakulin and R. Calvert. The virtual source method: Theory and case study. Geophysics, 71(4):SI139–SI150, 2006.
[8]
G. Schuster, J. Yu, J. Sheng, and J. Rickett. Interferometric/daylight seismic imaging. Geophysical Journal International, 157(2):838–852, 2004.
[9]
K. Wapenaar, J. Van Der Neut, E. Ruigrok, D. Draganov, J. Hunziker, E. Slob, J. Thorbecke, and R. Snieder. Seismic interferometry by crosscorrelation and by multidimensional deconvolution: A systematic comparison. Geophysical Journal International, 185(3):1335–1364, 2011.
[10]
J. van der Neut, J. Thorbecke, K. Mehta, E. Slob, and K. Wapenaar. Controlled-source interferometric redatuming by crosscorrelation and multidimensional deconvolution in elastic media. Geophysics, 76(4):SA63–SA76, 2011.
[11]
K. Wapenaar, E. Slob, and R. Snieder. Seismic and electromagnetic controlled-source interferometry in dissipative media. Geophysical prospecting, 56(3):419–434, 2008.
[12]
S. Minato, T. Matsuoka, and T. Tsuji. Singular-value decomposition analysis for seismic interferometry by multidimensional deconvolution. In SEG Technical Program Expanded Abstracts 2011, pages 2694–2699. Society of Exploration Geophysicists, 2011.
[13]
D. Boiero and C. Bagaini. Up and down deconvolution in complex geological scenarios. Geophysics, 86(5):WC55–WC65, 2021.
[14]
J. van der Neut and F. J. Herrmann. Interferometric redatuming by sparse inversion. Geophysical Journal International, 192(2):666–670, 2013.
[15]
J. van der Neut, M. Ravasi, Y. Liu, and I. Vasconcelos. Target-enclosed seismic imaging. Geophysics, 82(6):Q53–Q66, 2017.
[16]
N. Luiken and T. Van Leeuwen. Seismic wavefield redatuming with regularized multi-dimensional deconvolution. Inverse Problems, 36(9):095010, 2020.
[17]
D. Vargas, I. Vasconcelos, M. Ravasi, and N. Luiken. Time-domain multidimensional deconvolution: A physically reliable and stable preconditioned implementation. Remote Sensing, 13(18):3683, 2021.
[18]
R. Kumar, D. Boiero, C. Bagaini, and M. Vassallo. Transform-domain multidimensional deconvolution–sparsity v/s low-rank. In 83rd EAGE Annual Conference & Exhibition, volume 2022, pages 1–5. European Association of Geoscientists & Engineers, 2022.
[19]
F. Chen, M. Ravasi, and D. Keyes. A physics-aware, low-rank regularization for multidimensional deconvolution. arXiv preprint arXiv:2312.11004, 2023.
[20]
F. Chen, M. Ravasi, and D. Keyes. Solving multi-dimensional deconvolution via a nuclear-norm regularized least-squares approach. In 84th EAGE Annual Conference & Exhibition, volume 2023, pages 1–5. European Association of Geoscientists & Engineers, 2023.
[21]
M. Ravasi, T. Selvan, and N. Luiken. Stochastic multi-dimensional deconvolution. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
[22]
Y. Hong, H. Ltaief, M. Ravasi, L. Gatineau, and D. E. Keyes. Accelerating seismic redatuming using tile low-rank approximations on NEC SX-aurora TSUBASA. 2021.
[23]
H. Ltaief, Y. Hong, L. Wilson, M. Jacquelin, M. Ravasi, and D. E. Keyes. Scaling the “memory wall” for multi-dimensional seismic processing with algebraic compression on Cerebras CS-2 systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2023.
[24]
Y. Hong, H. Ltaief, M. Ravasi, and D. E. Keyes. seismic redatuming by inversion with algebraic compression and multiple precisions. 2022.
[25]
M. Ravasi, Y. Hong, H. Ltaief, D. Keyes, and D. Vargas. Large-scale Marchenko imaging with distance-aware matrix reordering, tile low-rank compression, and mixed-precision computations. In Second International Meeting for Applied Geoscience & Energy, pages 2606–2610. Society of Exploration Geophysicists and American Association of Petroleum Geologists, 2022.
[26]
M. Ravasi, Y. Hong, H. Ltaief, and D. Keyes. Tile-low rank compressed multi-dimensional convolution and its application to seismic redatuming problems. In 83rd EAGE Annual Conference & Exhibition, volume 2022, pages 1–5. European Association of Geoscientists & Engineers, 2022.
[27]
W. Hackbusch, B. Khoromskij, and S. Sauter. On \(\mathcal{H}^2\)-matrices. In H.-J. Bungartz, et al. (eds.), Lectures on Applied Mathematics, pages 9–30. Springer-Verlag, Berlin Heidelberg, 2000.
[28]
L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325–348, Dec. 1987.
[29]
K. Wapenaar, J. Van Der Neut, E. Ruigrok, D. Draganov, J. Hunziker, E. Slob, J. Thorbecke, and R. Snieder. Seismic interferometry by crosscorrelation and by multidimensional deconvolution: A systematic comparison. Geophysical Journal International, 185(3):1335–1364, 2011.
[30]
R. Tauler and D. Barceló. Multivariate curve resolution applied to liquid chromatography—diode array detection. TrAC Trends in Analytical Chemistry, 12(8):319–327, 1993.
[31]
R. Tauler, A. Smilde, and B. Kowalski. Selectivity, local rank, three-way data analysis and ambiguity in multivariate curve resolution. Journal of Chemometrics, 9(1):31–58, 1995.
[32]
S. Börm. Efficient numerical methods for non-local operators: H2-matrix compression, algorithms and analysis, volume 14. European Mathematical Society, 2010.
[33]
D. A. Sushnikova and I. V. Oseledets. Simple non-extensive sparsification of the hierarchical matrices. Advances in Computational Mathematics, 46(4):52, 2020.
[34]
C. C. Paige and M. A. Saunders. : An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software (TOMS), 8(1):43–71, 1982.
[35]
F. Aminzadeh, J. Brac, and T. Kunz. 3d salt and overthrust models. In SEG/EAGE Modeling Series, Tulsa, Oklahoma, 1997. Society of Exploration Geophysicists.
[36]
D. Vargas, I. Vasconcelos, Y. Sripanich, and M. Ravasi. Scattering-based focusing for imaging in highly complex media from band-limited, multicomponent data. Geophysics, 86(5):WC141–WC157, 2021.