September 07, 2025
Interfacing quantum and classical processors is an important subroutine in full-stack quantum algorithms. The so-called “classical shadow" method efficiently extracts essential classical information from quantum states, enabling the prediction of many properties of a quantum system from only a few measurements. However, for a small number of highly non-local observables, or when classical post-processing power is limited, the classical shadow method is not always the most efficient choice. Here, we address this issue quantitatively by performing a full-stack resource analysis that compares classical shadows with”quantum footage," which refers to direct quantum measurement. Under certain assumptions, our analysis illustrates a boundary of download efficiency between classical shadows and quantum footage. For observables expressed as linear combinations of Pauli matrices, the classical shadow method outperforms direct measurement when the number of observables is large and the Pauli weight is small. For observables in the form of large Hermitian sparse matrices, the classical shadow method shows an advantage when the number of observables, the sparsity of the matrix, and the number of qubits fall within a certain range. The key parameters influencing this behavior include the number of qubits \(n\), observables \(M\), sparsity \(k\), Pauli weight \(w\), accuracy requirement \(\epsilon\), and failure tolerance \(\delta\). We also compare the resource consumption of the two methods on different types of quantum computers and identify break-even points where the classical shadow method becomes more efficient, which vary depending on the hardware. This paper opens a new avenue for quantitatively designing optimal strategies for hybrid quantum-classical tomography and provides practical insights for selecting the most suitable quantum measurement approach in real-world applications.
Quantum computing is one of the most important areas driving the development of next-generation computing technologies [1]. Quantum computing leverages a much larger set of quantum states compared to classical computing [2], offering the potential to significantly enhance computational speed and to solve certain problems that classical computing cannot address efficiently [3]–[6]. This is specifically manifested in search [7], factorization [8], and simulation [9]. However, in many quantum algorithms such as the Harrow–Hassidim–Lloyd (HHL) algorithm [10], downloading the information of quantum states and measuring the quantum are important and constitute part of quantum computation. Therefore, downloading has become a major challenge in quantum computation. If a great deal of time is spent on downloading, it will reduce the quantum advantage. Accordingly, efficient tomography of quantum states has emerged as a significant research area. The recently proposed classical shadow method [11] harnesses both classical and quantum capabilities, allowing for the prediction of various quantum system properties with a logarithmic number of measurements, is now considering one of the main stream subrouting of quantum computing. Moreover, the classical shadow method has found widespread use in both experimental and theoretical computer science research [12], [13].
Despite its theoretical advantages, such as logarithmic scaling on number of copies of quantum states required, the practical benefits of using classical shadows compared to quantum footage (direct quantum measurement) strategies in specific experimental setups remain unclear. The efficiency and applicability of the classical shadow approach depend significantly on the type of observables being predicted. Even though the classical shadow method is highly efficient in estimating local observables (e.g., single-qubit expectation values), its practical benefit has been limited when dealing with global or complex observables (e.g., non-local correlation functions or complicated Hamiltonians) and when the number of observables is particularly not big. Additionally, shadow-based optimizations sometimes require additional post-processing time, making it challenging to assess its general effectiveness compared to quantum footage (direct quantum measurement) strategies, especially for problems that involve higher-order correlations.
In this work, we address these challenges by evaluating the resource requirements of the classical shadow method using a quantitative approach. To determine the parameter regimes in which the classical shadow method provides practical benefits, we conduct an extensive cost-based analysis. Since the types of observables vary significantly depending on the scenario, we define two representative types of observables: one is a linear combination of Pauli matrices with Pauli weight \(w\) and precision \(\varepsilon\), and the other is a Hermitian matrix of dimension \(2^n \times 2^n\) (where \(n\) is the number of qubits) with sparsity \(k\) and precision \(\varepsilon\), defined as the maximum number of non-zero elements per row. In our result, we derive analytical expressions to estimate the resource consumption of both approaches.
Since the classical shadow method is inherently a hybrid algorithm that involves both quantum and classical computation, the quantum part includes random classical rotations of the quantum state and quantum measurements, followed by transmitting the results to a classical computer. The classical part then applies an inverse transformation and performs averaging using the median-of-means algorithm in a quantum-robust manner [11]. We estimate the resource consumption of the quantum and classical components separately. For the classical part, the cost is measured in terms of floating-point operations, a widely accepted metric in high-performance computing. Given that the classical shadow method is particularly advantageous when applied to large-scale quantum systems, ordinary desktop or laptop computers may be insufficient for the classical post-processing. Therefore, we utilize data from supercomputing clusters, referring to performance data from the TOP500 list [14].
The quantum component primarily involves quantum gates and measurements, whose time scales on basic operations have been estimated in [15]. Our optimization goal is to minimize total resource cost under a constraint on the error rate, which we set to approximately \(0.01\), a value we treat as comparable to the target precision \(\varepsilon\). Finally, we compare the classical shadow method with quantum footage (direct quantum measurement) across both the input level and the full computational stack. Quantum footage can be seen as a fully quantum approach and similarly employs probabilistic bounds, such as Hoeffding’s inequality [16] and Chebyshev’s inequality [17], to estimate statistical error. We conduct a comprehensive resource analysis of quantum footage as well, including accurate estimations of the number of measurements required.
We organize our work as follows: In Section 2, we compare the resource usage of the classical shadow technique and the quantum footage method under various parameter settings. Section 3 explains how these results are obtained, including numerical simulations and theoretical analysis. Section 4 presents conclusions, suggests directions for future work, and includes additional technical details.
This chapter presents the main theoretical and numerical findings of this study. We compare two observable input schemes based on the classical shadow method and their corresponding quantum footage strategies. For clarity, we introduce abbreviations for the two input schemes: the observable input scheme using linear combinations of Pauli matrices is referred to as LCP, while the scheme using large Hermitian matrices is referred to as LHM.
Statement 1. Given \(M\) observables \(\{O_i\}\) (where \(1 \leq i \leq M\)) in the form of a linear combination of tensor products
of Pauli matrices, each observable consists of \(L\) terms in the summation, with each term having a Pauli weight of \(w\), and the coefficients following a standard normal distribution.
Then, consider a quantum system with a ground state density matrix \(\rho\) (unknown) and \(n\) qubits. The classical shadow method predicts the expectations \(\hat{o}_i\) of the \(M\) observables \(\{O_i\}\) (where \(1 \leq i \leq M\)), such that \[|\hat{o}_i
- \text{tr}(O_i \rho)| \leq \varepsilon \text{ for all } 1 \leq i \leq M\] holds with a probability of at least \(1 - \delta\).
(a) The quantum resources required by classical shadow method are: \[\begin{align}
G &\lesssim n \cdot T \\
T &\lesssim \frac{17L \cdot 3^{w}}{\varepsilon^{2}} \cdot \log\left(\frac{2M}{\delta}\right)
\end{align}\]
Here, \(G\) represents the number of quantum gates needed, and \(T\) represents the number of measurements required.
(b) The classical resource required by classical shadow method, i.e., the number of floating-point operations, is \(C\) FLOPs, where, \[\begin{align}
C &\lesssim M \cdot L \cdot \left( T \cdot \left( \frac{1}{3} \right)^w \cdot (w + 1) + 2 \cdot \log\left( \frac{2M}{\delta} \right) + 2 \right) \end{align}\] At this time, \(T\) also satisfies Equation
(2)
This statement can be regarded as a rigorous, reasonable, and saturated estimation of resource usage, supported by both mathematical proofs and numerical experiments. The detailed descriptions of the proofs and mathematical experiments are provided in the
appendix. In comparison with the classical shadow technique, where the measurement count is reduced by a factor of \(O(\log(M))\) in the original formulation, Statement 1 makes it more convenient to estimate resource
usage.
The same applies to the estimation of expectation values of observables in the form of linear combinations of Pauli matrices using quantum footages. In this case, similar statements hold, but the resource estimation focuses solely on the number of
quantum measurement operations. The details regarding the proof of Statement 2 and the corresponding mathematical experiments are included in the appendix.
Statement 2. For the estimation of the expectation values of observables, \(T'\) represents the number of measurements required by the quantum footage method, where, \[\begin{align}
T' &\lesssim \frac{0.5ML^3 }{\epsilon^2} \log\left( \frac{2ML}{\delta} \right)
\end{align}\] The meanings of these parameters are the same as those defined in Statement 1.
For the input method of observables involving Large Hermitian Matrices (LHM), we perform resource estimation and provide corresponding statements analogous to those presented for the LCP. Analyses and statements that are identical to those for the LCP will
not be repeated here. Similarly, we present two relevant statements. The same notation is used as follows for consistency.
Statement 3. Given \(M\) observables \(\{O_i\}\) (where \(1 \leq i \leq M\)) in the form of \(2^n \times
2^n\) Hermitian matrices, where \(n\) is the number of qubits in the given quantum system and \(k\) is the sparsity of the matrices (each row has at least \(k\) non-zero elements, each row of elements in the matrix is sampled from the standard normal distribution), the classical shadow method predicts the expectations \(\hat{o}_i\) of the \(M\) observables \(\{O_i\}\) (where \(1 \leq i \leq M\)), such that \[|\hat{o}_i - \text{tr}(O_i \rho)| \leq \varepsilon \text{ for all
} 1 \leq i \leq M\] with probability at least \(1 - \delta\).
(a) The quantum resources required by classical shadow method are(when \(k \ll 2^n\)): \[\begin{align}
G& \lesssim n \cdot T \label{eq:g95relation}
\end{align}\tag{1}\] \[\begin{align}
T& \lesssim \frac{34 \cdot \left[ k \sqrt{\frac{2}{\pi}} + \sqrt{k \left(1 - \frac{2}{\pi}\right) \cdot 2n \log 2} \right]^2 \cdot \left[ 2 \log(2M/\delta) \right]}{\varepsilon^2} \label{eq:t95relation}
\end{align}\tag{2}\] Here, \(G\) represents the number of quantum gates needed, and \(T\) represents the number of measurements required.
(b) The classical resource required by classical shadow method, i.e., the number of floating-point operations, is \(C\) FLOPs, where, \[\begin{align}
C &\lesssim T \left( 8(4^n - 1) + 32n + 7M \cdot 2^n k \right)
\end{align}\] At this time, \(T\) also satisfies Equation (5)
We also have a similar statement, which only needs to estimate the resource consumption based on the number of quantum measurements. The details of the proof of Statement 4 and the mathematical experiments are placed in the appendix.
Statement 4. For the estimation of the expectation values of observables, \(T'\) represents the number of measurements required by the quantum footage method, where, \[\begin{align}
T'&\lesssim M \cdot \frac{16k \log\left( \frac{2M}{\delta} \right)}{\epsilon^2}
\end{align}\] The meanings of these parameters are the same as those defined in Statement 3.
Lastly, we multiplex hardware parameters to highlight the relationship between specific runtime overheads and various input values. We find that the dominant computation time for each quantum gate in a superconducting quantum computer is approximately 10
nanoseconds, while the time for a single quantum measurement is on the order of 10 microseconds [15]. In comparison, the floating-point
operation rate of a supercomputer is on the order of \(10^{15}\) FLOP/s [14]. We provide a comprehensive explanation
of the estimation techniques in Section3. Figure2 presents heat maps illustrating the discrepancies between the classical shadow and quantum footage approaches for two types of observable inputs: LCP (\(M, L, n, w\)) and
LHM (\(M, k, n\)).
In these heat maps, for both the LCP and LHM cases, the blue regions represent areas where the ratio of the overhead incurred by the classical shadow method to that of the quantum footage method is less than or equal to one, i.e., \(t_{\text{shadow}} / t_{\text{footage}} \leq 1\). This indicates that the classical shadow method incurs less overhead compared to the quantum footage approach. Consequently, these regions may suggest the potential advantages of
employing a hybrid classical–quantum strategy. Moreover, under certain assumptions, Figure 2 also presents curves showing how the runtime of the classical shadow and quantum footage methods changes as various variables increase for the two observable
forms. Similarly, Figure 3 presents comparative curves of the runtime of the two prediction methods for the two observable forms, under certain assumptions, across multiple types of quantum computers. It should be emphasized that in our framework all
parameters are configurable, and the assumptions can be adjusted with respect to future quantum computing architectures.
As a summary, we have the following primary conclusions:
Figure 2: Comparison of Classical Shadow Method and Quantum Footage Method. In the plots, \(M\) represents the number of observables, \(n\) is the number of qubits, \(w\) is Pauli weight, \(L\) is the number of terms per observales. \(k\) is the sparsity, \(\epsilon\) is the precision tolerance of the prediction. (a) & (b): Heatmaps demonstrating the comparison between the classical shadow method and the quantum footage method for LCP observables and LHM observables under different problem parameters. Blue region indicates that the classical shadow method is faster than the quantum footage method, while red region indicates the opposite. (c) & (d): Runtime comparison between classical shadow method and quantum footage method for LCP observables and LHM observables. LCP: assuming number of terms per observales and number of qubits as \(\log (M)\), Pauli weight as \(\log \log(M)\). LHM: number of qubits and sparsity as \(\log (M)\).. a — Runtime Ratio (LCP): \(\frac{\text{Classical Shadow Overhead}}{\text{Quantum Footage Overhead}}\), b — Runtime Ratio (LHM): \(\frac{\text{Classical Shadow Overhead}}{\text{Quantum Footage Overhead}}\), c — Runtime Comparison (LCP)., d — Runtime Comparison (LHM).
Quantum Computer Type | Measurement Time | Typical Range | Gate Time | Typical Range |
---|---|---|---|---|
Superconducting (e.g., IBM) | \(10^{-5}\;\text{s}\) | 0.5–14 \(\mu\text{s}\) | \(10^{-8}\;\text{s}\) | 50–600 \(\text{ns}\) |
Ion Trap (e.g., IonQ) | \(10^{-4}\;\text{s}\) | 100–500 \(\mu\text{s}\) | \(10^{-5}\;\text{s}\) | 10–100 \(\mu\text{s}\) |
Photonic (e.g., Xanadu) | \(10^{-9}\;\text{s}\) | 1–10 \(\text{ns}\) | \(10^{-9}\;\text{s}\) | 0.1–10 \(\text{ns}\) |
Neutral Atom (e.g., Pasqal) | \(10^{-5}\;\text{s}\) | 10–50 \(\mu\text{s}\) | \(10^{-6}\;\text{s}\) | 0.5–5 \(\mu\text{s}\) |
Figure 3: Comparison of Classical Shadow Method and Quantum Footage Method of four types of quantum computer. In the plots, \(M\) represents the number of observables, \(n\) is the number of qubits, \(w\) is Pauli weight, \(L\) is the number of terms per observales. \(k\) is the sparsity, \(\epsilon\) is the precision tolerance of the prediction. (a) & (b): Runtime comparison between classical shadow method and quantum footage method of four different types of quantum computer for LCP observables and LHM observables. LCP: assuming number of terms per observales and number of qubits as \(\log (M)\), Pauli weight as \(\log \log(M)\). LHM: number of qubits and sparsity as \(\log (M)\).. a — Runtime Comparison (LCP)., b — Runtime Comparison (LHM).
According to the intersection point in Figure 2(c), when the observable takes the form of a linear combination of Pauli matrices, the classical shadow method begins to outperform the quantum footage method at approximately \(\log(M) = 4\), corresponding to \(M \approx 16\), i.e., on the order of \(O(10^1)\). Due to parallel operations in quantum measurements, their efficiency is
largely unaffected by either the number of qubits \(n\) or the Pauli weight \(w\). Although the number of measurements required by the classical shadow method scales logarithmically compared
to that of the quantum footage method, for small values of \(M\), the increased resource consumption caused by \(n\) and \(w\) outweighs the benefit gained
from the reduced dependence on \(M\). This limits the ability of the classical shadow method on those scalings.
According to Figure 2(d), which considers more general observables including all Hermitian matrices, we observe that when \(M\) is very small, as expected, the quantum footage method outperforms the classical shadow
method. Specifically, in the figure, when \(\log(M) \leq 6.5\), corresponding to \(M\) on the order of \(O(10^2)\), the direct quantum method exhibits a
clear advantage. While one might anticipate that the classical shadow method would become dominant as \(M\) increases, an interesting reversal occurs around \(\log(M) = 26\), corresponding
to \(M\) on the order of \(O(10^8)\), where the quantum measurement method again becomes more favorable. This is because, for observables represented as large matrices, the number of
floating-point operations required by the classical approach grows exponentially with the number of qubits \(n\).
It is also worth noting that, for observables in the form of linear combinations of Pauli matrices (LCP), the number of qubits \(n\) has only a minor impact on the resource consumption of the classical shadow method,
indicating a weak dependence. In contrast, for observables represented as large Hermitian matrices, the resource consumption exhibits a stronger dependence on \(n\).
Figure 2 assumes a superconducting quantum computer as the default quantum device, with a quantum measurement time of approximately \(10^{-5}\, \text{s}\) and a quantum gate time of around \(10^{-8}\, \text{s}\). Additional data on quantum measurement and gate times for other types of quantum computers can be found in [15], including ion trap, photonic, and neutral atom quantum computers. The approximate measurement and gate times for these platforms, along with representative quantum computers for each type, are summarized in Table 1. We
conducted analyses similar to those in Figure 2(c) and (d) for the different types of quantum computers, plotting eight lines for each type of observable. These results are presented in Figure 3(a) and (b).
According to Figure 3(a), we observe that for LCP-type observables, the resource-saving advantage of the classical shadow method begins to emerge when \(M \approx 2^4 \sim 2^5\) (i.e., \(M =
16 \sim 32\)) for superconducting, ion trap, and neutral atom quantum computers. In contrast, for photonic quantum computers, this advantage becomes apparent only at \(M \approx 2^{6.5} \approx 100\), i.e., when
\(M\) reaches \(\mathcal{O}(10^2)\). Overall, photonic quantum computers demonstrate significantly faster performance compared to other types of quantum computing platforms.
According to Figure 3(b), we observe that for LHM-type observables, the resource-saving advantage of the classical shadow method begins to manifest on superconducting, ion trap, and neutral atom quantum computers when \(M \approx 2^{6.5} \sim 2^{7.5}\), i.e., \(M \approx \mathcal{O}(10^2)\). As \(M\) increases further to approximately \(2^{26} \sim 2^{30}\) (i.e., \(M \approx \mathcal{O}(10^8 \sim 10^{10})\)), this advantage gradually diminishes, and the benefit of the quantum footage method begins to emerge. In contrast, for photonic quantum computers, the performance curves of the classical shadow method and the quantum measurement method do not intersect, indicating that the implementation of the quantum footage method on photonic platforms can be highly efficient.
In this section, we present a method to estimate and compare the resources required for various types of observables, using both classical shadow techniques and quantum footage approaches. It is important to note that the primary objective of our research is to develop a cross-job resource estimation framework. Accordingly, in our numerical experiments, we employ ground state density matrices of randomly generated quantum states. This allows us to provide concrete numerical examples that validate the reliability and accuracy of our resource estimation method. In the following, we offer a concise overview of our approach to estimating the resource cost associated with the two types of observables.
In this scenario, resources are divided into classical and quantum components. We first randomly generate \(M\) observables for an \(n\)-qubit quantum system, where each observable is a linear combination of tensor products of Pauli matrices, consisting of \(L\) terms, each with a Pauli weight of \(w\). For example, for the observable \(O = \sum_{i=0}^{n-1} X_i X_{i+1} + Y_i Y_{i+1} + Z_i Z_{i+1}\) , we have \(L = 3n\) and \(w = 2\).
The quantum resource estimation process primarily involves rotating the initial quantum state using quantum gate operations and measuring the rotated state in the computational basis. During this rotation process, it is essential to determine the
values of the single-qubit gates, which are directly related to the number of measurements required to achieve the desired precision. In our experiments, we assume a precision of \(\varepsilon = 0.01\) and a failure
probability of \(\delta = 0.01\). The number of measurements required to estimate a single observable with Pauli weight \(w\) is given in [11]. However, we address a more general case where each observable must be decomposed into simpler Pauli observables of weight \(w\). For each of these individual observables, the required number of measurements can be determined by following the procedure outlined in [11]. By summing the measurement counts for all such decomposed observables, we obtain an estimate of the total number of measurements \(T\) required for the
general observable. Consequently, the total number of single-qubit gates is clearly \(n \times T\), since each qubit must undergo a single-qubit gate operation during each measurement. According to [15], for superconducting quantum computers, the operation time for a single-qubit gate is approximately \(10^{-8}\) seconds, while a
single quantum measurement takes around \(10^{-5}\) seconds. Although Table 1 summarizes the corresponding data for various types of quantum computers, this study first takes superconducting quantum computers as an example
to compare the two methods.
In classical computing, resource usage is typically quantified in terms of the number of floating-point operations (FLOPs). Classical computations involving matrix multiplications are central to many problems in the shadow method. However, when the observables are expressed in Pauli form, these calculations can be significantly simplified. Specifically, it suffices to verify whether the measurement basis matches the basis of the observable. If they match, the expectation value of the observable can be predicted by averaging the product of the corresponding eigenvalues, as detailed in the Appendix. This approach eliminates the need for explicit matrix multiplications, thereby substantially reducing the classical computational resources required. The rationale for distinguishing between different types of observables in this analysis is to optimize resource consumption based on the observable’s structure. Accordingly, we estimate the number of FLOPs required for classical computations by analyzing the code used in this matching-based prediction method. Since our goal is to estimate resources rather than constrain the device to perform specific predictive operations, we can assume the maximum computational capability of the device.For this estimation, we assume the use of a supercomputer whose model and floating-point operation performance are listed on the TOP500 website [14]. These systems typically operate at speeds on the order of \(O(10^{14})\) to \(O(10^{16})\) FLOPs per second. For our analysis, we assume a computational speed of approximately \(10^{15}\) FLOPs per second.
The number of measurements required in this example differs due to the use of a different type of observable. To estimate the sufficient number of measurements, we must employ the shadow norm of the observable, as introduced in [11]. However, this reference only provides a computational method for evaluating the shadow norm of Pauli-form observables. In our case,
the shadow norm is replaced by the infinity norm, which can be shown to correspond to the worst-case scenario. In [11], an upper
bound on the shadow norm is provided for observables expressed as linear combinations of Pauli operators, measured under Pauli measurement settings. However, this upper bound is not readily applicable to observables represented by large matrices, unless
these matrices are decomposed into Pauli form—an operation that incurs significant additional computational cost.To address this, we demonstrate that when the sparsity \(k \ll 2^n\), the upper bound can be directly
approximated by the infinity norm. We then estimate this infinity norm using statistical methods, leading to an approximate upper bound expression that depends only on the sparsity \(k\) and the number of qubits \(n\). With this approximation, the computation of the shadow norm becomes straightforward. By substituting the estimated upper bound into the expression for the required number of measurements \(N\), we can effectively estimate the quantum resources needed for the task.
The classical computational resources required are comparable to those for Pauli-form observables. However, when dealing with large Hermitian matrix observables, matrix multiplications become unavoidable. To reduce resource consumption and accelerate computations, we employ sparse matrix techniques. As in previous cases, the number of floating-point operations (FLOPs) is determined by implementing and analyzing the corresponding computational procedures in code.
Thus, we conclude the estimation of computational resources required for both types of observables under different methods. The detailed calculations and derivations are provided in the appendix.
Finally, we observe that Pauli-form observables may require significantly fewer classical computational resources compared to large Hermitian matrix observables. A corresponding decomposition method connecting those two forms has been developed, as explained in the appendix.
In this paper, we have conducted a comprehensive resource estimation of the classical shadow technique, encompassing both classical and quantum components. Unlike existing approaches, which often rely on empirical or simulation-based analysis, our study provides an analytical and quantitative framework for resource estimation, which can more accurately predict the consumed resources.
Our research results reconfirm that the classical shadow method achieves logarithmic resource savings in terms of the number of measurements. However, when dealing with both complex and irregular observables as well as extremely simple ones, it does not show significant advantages compared to the Quantum footage method. Our findings refine the applicable scope of this method, thereby deepening our understanding of its practical performance.
Specifically, our analysis shows that, under certain assumptions, the classical shadow method offers resource advantages in the following scenarios: (i) when the number of observables expressed as linear combinations of Pauli matrices is large; and (ii) when the number of observables represented as large Hermitian matrices is in a certain rage. Since our estimates for floating-point operations and shadow norms are based on worst-case assumptions, the actual resource costs in practice are expected to be lower.
This study offers valuable guidance for the practical deployment of quantum measurement techniques. One notable challenge that remains is the development of an adaptive switching mechanism between the classical shadow method and quantum footage, particularly in dynamic observation scenarios—such as real-time updates in many-body systems. This capability is essential for time-dependent applications like real-time monitoring of quantum systems and dynamic simulations in quantum chemistry.
Moreover, resource consumption curves for both measurement methods can be accurately emulated via classical simulations. This enables pre-planned optimization of measurement strategies tailored to specific application scenarios, significantly reducing resource redundancy and experimental cost.
Additional contributions of this work include a step-by-step comparative analysis of the classical shadow method versus quantum footage in the extraction of physical properties, a theoretical investigation of upper bounds under real-world constraints, and a detailed discussion on the formulation of various types of observables. These efforts further enhance the generality and applicability of the proposed approach, with detailed derivations and results provided in the appendix.
In future work, first, since this article only considered the case of Pauli measurement, we will incorporate the Clifford measurement method in our future research to make the resource comparison between the classical shadow and quantum footage methods more comprehensive. Second, we may have overlooked the impact of some quantum noise in the process of estimating resource consumption, and we will take the influence of certain types of noise into account in future work. Finally, this article only provided detailed resource estimation and analysis regarding time consumption; we can certainly conduct a comparison of the energy consumption of the two methods within the overall resource estimation, thereby making our entire comparison framework more practically meaningful for energy efficiency [18].
We thank Hsin-Yuan Huang for helpful discussions.
Appendix
The classical shadow described in the paper [11] is a classical description of a quantum state, which is created by special
measurements and further processing quantum state. Data acquisition and processing are the challenges with two major effort-intensive steps of the construction process.
In a multi-qubit system, for an unknown quantum state \(\rho\), randomly select a unitary transformation \(U\) from a fixed unitary set \(\mathcal{U}\), and
rotate the quantum state as \(\rho\mapsto U\rho U^{\dagger}\). Subsequently, perform a measurement on the rotated quantum state in the computational basis \(\{|b\rangle: b\in\{0,1\}^n\}\).
According to Born’s rule, the probability of obtaining measurement result \(|\hat{b}\rangle\in\{0,1\}^n\) is \(\text{Pr}[\hat{b} = b]=\langle b|U\rho U^{\dagger}|b\rangle\). After
measurement, process the result and store \(U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U\). This random measurement outcome \(U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U\) contains valuable
information about \(\rho\) in expectation, with \(\mathbb{E}[U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U]=\mathcal{M}(\rho)\), where \(\mathcal{M}\) is a
quantum channel determined by the unitary set \(\mathcal{U}\).
Now,we can build the classical shadow.To extract quantum state information from measurements, further process \(U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U\). Since \(\mathcal{M}\) as a
linear map has a unique inverse (provided the measurement defined by \(\mathcal{U}\) is tomographically complete), denoted as \(\mathcal{M}^{-1}\). Applying \(\mathcal{M}^{-1}\) to \(U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U\) yields \(\hat{\rho}=\mathcal{M}^{-1}(U^{\dagger}|\hat{b}\rangle\langle\hat{b}|U)\), which is
a classical shadow of \(\rho\). Repeating this process \(N\) times generates \(N\) independent classical shadows, forming an array: \[\begin{align}
S(\rho; N)=\{\hat{\rho}_1=\mathcal{M}^{-1}(U_1^{\dagger}|\hat{b}_1\rangle\langle\hat{b}_1|U_1),\ldots,\hat{\rho}_N=\mathcal{M}^{-1}(U_N^{\dagger}|\hat{b}_N\rangle\langle\hat{b}_N|U_N)\}
\end{align}\] This completes the classical shadow construction.
Algorithm 1 from [11] is a median-of-means prediction algorithm based on classical shadows \(S(\rho,
N)\), designed to predict linear function values of quantum states. The specific process is:
The function takes three parameters: observables \(O_1,\ldots,O_M\), classical shadows \(S(\rho; N) = [\hat{\rho}_1,\ldots,\hat{\rho}_N]\), and a positive integer \(K\). Here, \(S(\rho; N)\) contains \(N\) classical approximations of \(\rho\), and \(K\)
determines how to partition the shadows for robust estimation.
The first step is dividing \(S(\rho; N)\) into \(K\) equal parts. For each part, compute sample means to build \(K\) estimators: \[\begin{align}
\hat{\rho}^{(k)} = \frac{1}{\lfloor N/K \rfloor} \sum_{i=(k - 1)\lfloor N/K \rfloor + 1}^{k\lfloor N/K \rfloor} \hat{\rho}_i, \quad k = 1,\ldots,K
\end{align}\] This partitioning mitigates outlier effects by creating multiple independent estimators.
The second step is using the median-of-means Algorithm. For each observable \(O_i\;(i = 1,\ldots,M)\), compute traces \(\text{tr}(O_i\hat{\rho}^{(k)})\) across all \(K\) estimators. The final prediction is the median of these values: \[\begin{align}
\hat{o}_i(N, K) = \text{median}\{\text{tr}(O_i\hat{\rho}^{(1)}),\ldots,\text{tr}(O_i\hat{\rho}^{(K)})\}
\end{align}\] Median estimation enhances robustness against outliers compared to simple averaging.
Through these steps, Algorithm 1 effectively predicts linear function values of quantum states using classical shadows and median-of-means techniques.
Restate the definitions of some symbolically represented parameters:
\(M = \texttt{num\_observables}\)
\(L = \texttt{terms\_per\_observable}\)
\(w = \texttt{pauli\_weight}\)
\(T = \texttt{max\_measurements}\)
\(K = \texttt{num\_groups}\)
\(\varepsilon = \texttt{estimation\_error}\)
\(\delta = \texttt{failure\_probability}\)
\(k = \texttt{sparsity\_of\_matrix}\)
In this case, the formula for the total number of required measurements along with its proof, as well as the calculation method for \(\| O \|_{\text{shadow}}^2\) under these conditions, can be found in the appendix of
[11] and are listed as follows: \[\begin{align} T = \left\lceil \frac{34}{\varepsilon^2} \cdot \max_i \|
O_i \|_{\text{shadow}}^2 \cdot 2 \cdot \log \left( \frac{2M}{\delta} \right) \right\rceil
\end{align}\] We can find \(\| O_i \|_{\text{shadow}}^2=3^w\) in [11] when L equals 1 and
there is no coefficient preceding each observable. For the general case, \(\| O_i \|_{\text{shadow}}^2 = \sum_{j = 1}^L c_j^2 \cdot 3^w\). Moreover, in Statement 1 we assumed \(c_j \sim
\mathcal{N}(0, 0.5)\) truncated to \([-1, 1]\), so we can easily calculate the average \(c_j^2 \approx 0.25\). Therefore, \(\| O_i \|_{\text{shadow}}^2
\approx L \cdot 0.25 \cdot 3^w\).
Substituting the aforementioned approximation of \(\| O_i \|_{\text{shadow}}^2\), we obtain the following expression: \[\begin{align}
T &\lesssim \frac{17L \cdot 3^{w}}{\varepsilon^{2}} \cdot \log\left(\frac{2M}{\delta}\right)
\end{align}\] The quantum resource component in Statement 1 has been rigorously proven.
For resource estimation of classical computations, we use the number of floating-point operations as a proxy, and only need to analyze the floating-point operations in the code.
Core Function Calls
The program uses the Classical Shadow method to estimate the expectation values of complex observables. Some Core functions related to classical computing are as follows::
randomized_classical_shadow
: Generates random measurement bases
estimate_exp
: Processes a single Pauli term
estimate_exp_mom
: Performs grouped estimation
estimate_complex_observable
: Processes all Pauli terms for a single observable
estimate_multiple_observables
: Estimates all observables
In the following, we will divide the method of using the classical shadow approach to predict the expectation of observables in the form of linear combinations of Pauli operators into six steps. Step 1 first randomly generates the measurement bases;
Step 2 simulates quantum measurements; Step 3 estimates the expectation of a single term; Step 4 applies the Median-of-means algorithm on the basis of Step 3; Step 5 sums each individual term to obtain the expectation of each observable; Step 6 predicts
the expectations of all observables. The functions used in each step are presented in the form of Python code, along with a brief explanation and an analysis of floating-point operations.
Step 1: Generation of Measurement Basis
In Step 1, we construct a function for randomly generating measurement bases, which assigns X/Y/Z bases to different qubits with uniform probability. As we can see in Figure 4, regardless of the size of the quantum system, the measurement process is
performed independently for each individual qubit. Since the number of required measurements is \(T\), the number of measurement bases to be generated is \(nT\). However, this step does not
involve floating-point operations and thus is not counted. Next, I will first calculate the matching probability. Briefly speaking, it refers to the probability that the measurement basis at this position is the same as the Pauli operator at the
corresponding position of the observable. This matching probability plays an important role in Step 3.
First, we consider the calculation of matching probability, for a Pauli-form observable with only one term and a Pauli weight of \(w\)(e.g. \(Z_0X_2Y_5\), with \(w=3\)), the single-position matching probability: \(P(\text{basis} = \text{operator}) = \frac{1}{3}\). Therefore, the probability that all \(w\) positions match
simultaneously: \[\begin{align} p = \left( \frac{1}{3} \right)^w
\end{align}\] Step 2: Quantum Measurement
In this step, we use a classical computer to simulate quantum computer measurements and store the results of \(T\) measurements in \(\text{full\_measurement}\). This step does not involve
any classical computing and naturally does not involve any floating-point operations.
Step 3: Estimate of Expectation
In Step 3, we use a function from the code attached in the original paper that proposed classical shadow [11]. This code is very concise and efficient, avoiding complex matrix multiplications in classical shadows and directly using a matching method to predict the expectation. The term "one_observable" in the function represents one of the terms in the summation of an observable expressed as a linear combination of Pauli operators, and the summation will be performed in Step 5. The specific principle of this code will not be explained in detail; we only briefly describe the calculation of floating-point numbers:
Input: \(T'\) measurements (group size), \(w\) Pauli operators
Core operations: This function includes three major steps. Checking whether basis matches is the first step, which are all boolean operations. Boolean operations not counted as FLOPs. The sencond and third steps are Floating-point multiplication only when bases match and Floating-point addition only when all positions match. These two steps include some operations involving floating-point arithmetic.
Total floating-point operations:: The complete match probability is \(p = (1/3)^w\). In such a scenario, the floating-point operations involved consist of \(w\) multiplications (when a match occurs) plus 1 addition (specific to the complete match). Based on this composition, the expected number of floating-point operations can be expressed as \[\begin{align} \text{FLOPs}_{\text{exp}} = T' \cdot p \cdot (w + 1) \end{align}\] where the term \((w + 1)\) directly corresponds to the total count of these multiplicative and additive operations.
Step 4: Using the Median of Means Method
In Step 4, we use the Median-of-means algorithm to enhance the accuracy and stability of the results. In practice, it adds operations on the basis of estimate_exp, specifically grouping first, then taking the average of each group, and finally taking the median. The following is the floating-point number analysis for this function:
Input: \(T\) measurements, \(K\) groups
Core operations: The process involves calling the estimate_exp
function \(K\) times, with each call utilizing \(T' = T/K\) measurements—this division
of the total \(T\) measurements into \(K\) groups, each of size \(T'\). Within each group, a key operation is a division that corresponds to 1
floating-point operation, which is executed when a match condition is met. Following these group-level computations, the median calculation is performed to aggregate results across groups; this step is primarily composed of comparison operations, where the
associated floating-point operations are so minimal that they can be considered negligible in overall complexity assessments.
Total floating-point operations: \[\begin{align} \text{FLOPs}_{\text{mom}} = K \cdot \left[ \frac{T}{K} \cdot p \cdot (w + 1) + \mathbb{I}_{\text{match}} \right] = T \cdot p \cdot (w + 1) + K_{\text{eff}} \end{align}\] \(K_{\text{eff}}\) is the number of groups with actual matches (\(\leq K\)), conservatively estimated as \(K\)
Step 5: Summing the Expectation of Each Term
Since Step 3 and Step 4 only estimate the expectation of a single-term Pauli-form observable, we sum the expectations of each Pauli-form observable term in Step 5 to obtain the total expectation. The following is a brief analysis of floating-point operations:
Input: \(L\) Pauli terms
Core operations: The floating-point estimation of this function is relatively simple; it merely involves calling the previous estimate_exp_mom function L times. During each call, two additional floating-point operations are required: the first is a multiplication needed to construct the weight (coefficient) preceding each Pauli term, and the second is the summation of each individual term.
Total floating-point operations: \[\begin{align} \text{FLOPs}_{\text{complex}} = L \cdot \left[ T \cdot p \cdot (w + 1) + K + 2 \right] \end{align}\]
Step 6: Predict the Expectations of All Observables
In this step, we will include all observables and need to introduce the number of observables \(M\). We apply the previous Steps 3, 4, and 5 to each observable to predict the expectation. The analysis of floating-point operations is simply a multiplication by \(M\), as follows:
Input: List of \(M\) observables
Core operations: Calls estimate_complex_observable
\(M\) times
Total floating-point operations: \[\begin{align} \text{FLOPs}_{\text{total}} = M \cdot L \cdot \left[ T \cdot \left( \frac{1}{3} \right)^w \cdot (w + 1) + K + 2 \right] \end{align}\]
Finally, substituting the value of \(K\) yields the final result: \[\begin{align} \text{FLOPs}_{\text{total}} &\lesssim M \cdot L \cdot \left( T \cdot \left( \frac{1}{3} \right)^w \cdot (w + 1) + 2 \cdot \log\left( \frac{2M}{\delta} \right) + 2 \right) \end{align}\] With this, the proof of Statement 1 is complete.
In quantum mechanics, directly measuring the expectation value \(\text{Tr}(O_m\rho)\) of an observable \(O_m\) (where \(\rho\) is the quantum state, \(m = 1, 2, \ldots, M\)) typically involves projecting the quantum state onto the eigenbasis of \(O_m\), performing multiple measurements, and averaging the results. For observables expressed as
linear combinations of Pauli operators, \(O = \sum_{j=1}^L c_j P_j\), this method measures each Pauli term \(P_j\) individually and combines the results to compute \(\langle O \rangle\).The resource estimation of this method is achieved through the following five steps:
Step 1: Approximate the Maximum Sum of Absolute Coefficients \[\begin{align}
\text{max\_sum\_abs\_coeff} = L \cdot \mu
\end{align}\] Each observable has \(L\) Pauli terms with coefficients \(c_j\). The sum \(\sum_{j=1}^L |c_j|\) scales the error in the expectation
value. Here, \(\mu\) (default 0.5) approximates the average \(|c_j|\), so \(L \cdot \mu\) estimates this sum.
Step 2: Allocate Error and Failure Probability \[\begin{align}
\epsilon' = \frac{\epsilon}{\text{max\_sum\_abs\_coeff}} = \frac{\epsilon}{L \cdot \mu}
\end{align}\] \[\begin{align}
\delta' = \frac{\delta}{M \cdot L}
\end{align}\]
Error (\(\epsilon'\)): The total error \(\epsilon\) is divided by \(L \cdot \mu\) to allocate an error per Pauli term.
Failure Probability (\(\delta'\)): The total \(\delta\) is split across \(M \cdot L\) terms using the union bound.
Step 3: Measurements per Pauli Term \[\begin{align}
N_j = \left\lceil \frac{2}{\epsilon'^2} \ln\left(\frac{2}{\delta'}\right) \right\rceil
\end{align}\] Using the Hoeffding inequality, for Pauli outcomes \(\pm 1\), the number of measurements \(N_j\) ensures the error is within \(\epsilon'\) with probability \(1 - \delta'\): \[\begin{align}
\delta'=P(|\bar{X} - \langle P_j \rangle| \geq \epsilon') \leq 2 \exp\left(-\frac{N_j \epsilon'^2}{2}\right)
\end{align}\] \[\begin{align}
N_j \leq \frac{2}{\epsilon'^2} \log\left(\frac{2}{\delta'}\right)
\end{align}\]
Step 4: Total Number of Measurements \[\begin{align}
T' = M \cdot L \cdot N_j
\end{align}\] With \(M\) observables and \(L\) terms each, the total measurements are the product.
Step 5: Final Expression \[\begin{align}
T' \leq M \cdot L \cdot \left\lceil \frac{2}{\left( \frac{\epsilon}{L \cdot \mu} \right)^2} \log\left( \frac{2}{\frac{\delta}{M \cdot L}} \right) \right\rceil
\end{align}\] Simplified: \[\begin{align}
T' \leq M \cdot L \cdot \left\lceil \frac{2 (L \cdot \mu)^2}{\epsilon^2} \log\left( \frac{2 M \cdot L}{\delta} \right) \right\rceil
\end{align}\] Substituting the default value \(\mu=0.5\) yields: \[\begin{align}
T' &\lesssim \frac{0.5ML^3 }{\epsilon^2} \log\left( \frac{2ML}{\delta} \right)
\end{align}\] With this, the proof of Statement 2 is complete.
Before proving Statements 3 and 4, we first present some important remarks.
Since the shadow norm is a crucial norm in the classical shadow method, its magnitude directly determines the number of measurements required.For observables in the form of large matrices, we can use their infinity norm instead of the shadow norm. The
following are explanations and justifications for this approach:
In [11], the shadow norm is defined as \[\begin{align} \| O \|_{\text{shadow}}^2 = \max_{\sigma \text{
(state)}} \mathbb{E}_{U \sim \mathcal{U}} \sum_{b \in \{0,1\}^n} \langle b | U \sigma U^\dagger | b \rangle \cdot [\text{Tr}(O \mathcal{M}^{-1}(U^\dagger | b \rangle \langle b | U))]^2
\end{align}\] The shadow norm in numerical computation for drawing images is defined as(): \[\begin{align} \left\| O - \frac{\text{Tr}(O)}{d} I \right\|_{\text{shadow}}^2 = \left\| O - \frac{\text{Tr}(O)}{d} I
\right\|_{\infty}^2
\end{align}\] where:
\(d = 2^n\): the dimension of the Hilbert space (\(n\) is the number of qubits),
\(\text{tr}(O)\): the trace of the observable,
\(\| \cdot \|_{\infty}\): the infinity norm, i.e., the largest singular value of the matrix.
Simple verbal explanations of this claim:
First, state an important conclusion: For the random Pauli measurement scheme, the theoretical derivation in [11]reveals a key
result: The upper bound of the shadow norm \(\| O \|_{\text{shadow}}^2\) is directly related to the infinity norm of the observable \(O\), specifically expressed as: \[\begin{align}
\| O - \frac{\text{tr}(O)}{d} I\|_{\text{shadow}}^2 \leq c \cdot \left\| O - \frac{\text{tr}(O)}{d} I \right\|_{\infty}^2
\end{align}\]
where \(c\) is a constant related to the measurement scheme and the form of the observable (for random Pauli measurements, when the observable is a linear combination of Pauli matrices, \(c=4^w\), where \(w\) is the Pauli weight).
When \(w \ll n\), i.e., \(2^w \ll 2^n\), we consider directly using the infinity norm to replace the shadow norm. (Since the sparsity \(k\) selected in our large matrix operations is very small compared to \(2^n\), the matrices we study are relatively local. Alternatively, in the Pauli linear combination form obtained by decomposing a large matrix, the Pauli weight \(w\) is much smaller than \(n\). Moreover, it is shown in [11] that the upper bound \(4^k \| O \|_{\infty}\) is a relatively sparse upper bound. Therefore, in numerical calculations, the computational complexity mainly arises from the observable \(O\) itself. Hence, it is reasonable to ignore constant terms.)
We need to estimate the value of \(\left\|O - \frac{\text{tr}(O)}{2^n}I\right\|_{\infty}\), where \(O\) is a \(2^n\times2^n\) Hermitian matrix with each
row containing \(k\) non-zero elements, and these non-zero elements follow a standard normal distribution \(N(0, 1)\). Here, \(n\) is the number of qubits,
\(I\) is the identity matrix, and \(\text{tr}(O)\) is the trace of \(O\). We aim to obtain an expression in terms of \(k\)
and \(n\) to roughly describe the magnitude of this norm.
First, recall the definition of \(\|A\|_{\infty}\). For a matrix \(A\), the infinity norm is the maximum of the sums of absolute values of the rows: \[\begin{align}
\|A\|_{\infty} = \max_{1\leq i\leq 2^n} \sum_{j = 1}^{2^n} |a_{ij}|
\end{align}\] Therefore, our goal is to compute: \[\begin{align}
\left\|O - \frac{\text{tr}(O)}{2^n}I\right\|_{\infty} = \max_{1\leq i\leq 2^n} \sum_{j = 1}^{2^n} \left|o_{ij} - \frac{\text{tr}(O)}{2^n}\delta_{ij}\right|
\end{align}\] where \(\delta_{ij} = 1\) if \(i = j\), and 0 otherwise. This implies that for each row \(i\): \[\begin{align}
\sum_{j = 1}^{2^n} \left|o_{ij} - \frac{\text{tr}(O)}{2^n}\delta_{ij}\right| = \sum_{j\neq i} |o_{ij}| + \left|o_{ii} - \frac{\text{tr}(O)}{2^n}\right|
\end{align}\] We need to estimate the maximum of this sum.
Then, we list the properties of O below.
Hermiticity: \(O = O^{\dagger}\). Since \(O\) is a real matrix (elements from \(N(0, 1)\)), this means \(O\) is symmetric, i.e., \(o_{ij} = o_{ji}\).
Sparsity: Each row has \(k\) non-zero elements, and due to symmetry, each column also has \(k\) non-zero elements.
Element Distribution: Non-zero elements \(o_{ij}\) are drawn from the standard normal distribution \(N(0, 1)\).
One of our goals is to estimate \(tr(O)\). The trace is the sum of all diagonal elements: \[\begin{align} \text{tr}(O) = \sum_{i = 1}^{2^n} o_{ii} \end{align}\] However, \(o_{ii}\) are not necessarily non-zero. Each row has \(k\) non-zero elements randomly selected (without replacement) from \(2^n\) positions, so the probability that \(o_{ii} \neq 0\) is \(\frac{k}{2^n}\).
If \(o_{ii} \neq 0\), then \(o_{ii} \sim N(0, 1)\);
If \(o_{ii} = 0\), it does not contribute to the trace.
The number of non-zero diagonal elements approximately follows a binomial distribution \(\text{Bin}(2^n, \frac{k}{2^n})\), with expectation \(k\). Each non-zero \(o_{ii} \sim N(0, 1)\), so: \[\begin{align} \text{tr}(O) = \sum_{i: o_{ii}\neq 0} o_{ii} \end{align}\] Its variance is: \[\begin{align} \text{Var}(\text{tr}(O)) = \text{expected number of non-zero terms} \times \text{Var}(o_{ii}) = k \cdot 1 = k \end{align}\] Thus, \(\text{tr}(O) \sim N(0, k)\), and: \[\begin{align} \frac{\text{tr}(O)}{2^n} \sim N\left(0, \frac{k}{2^{2n}}\right) \end{align}\] For large \(n\), \(\frac{\text{tr}(O)}{2^n}\) is very small, with a standard deviation of \(\sqrt{\frac{k}{2^{2n}}}\). For example, if \(k\) is constant or grows slowly with \(n\) (e.g., \(k = n\)), this term can be neglected compared to others. Calculation of Row Sums For the \(i\)-th row, the row sum is: \[\begin{align} S_i = \sum_{j = 1}^{2^n} \left|o_{ij} - \frac{\text{tr}(O)}{2^n}\delta_{ij}\right| = \sum_{j\neq i} |o_{ij}| + \left|o_{ii} - \frac{\text{tr}(O)}{2^n}\right| \end{align}\]
\(\sum_{j\neq i} |o_{ij}|\)
Each row of the matrix contains \(k\) non-zero elements, with the distribution of these non-zeros depending on the value of the diagonal element \(o_{ii}\): specifically, if \(o_{ii} \neq 0\), there are \(k - 1\) non-zero elements among the off-diagonal positions \(j \neq i\), whereas if \(o_{ii} = 0\),
all \(k\) non-zero elements lie in the off-diagonal positions \(j \neq i\). The probability that the diagonal element \(o_{ii} \neq 0\) is given by \(\frac{k}{2^n}\), a value that becomes particularly small when \(k \ll 2^n\); in such cases, the overwhelming majority of diagonal elements \(o_{ii}\) will be
zero, meaning the non-zero elements in most rows are concentrated entirely in the off-diagonal positions.
Let \(m_i\) be the number of non-zero elements in row \(i\) for \(j \neq i\), then \(m_i = k\) or \(k - 1\). Each non-zero \(o_{ij} \sim N(0, 1)\), so \(|o_{ij}|\) follows a half-normal distribution with expectation: \[\begin{align}
\mathbb{E}[|o_{ij}|] = \sqrt{\frac{2}{\pi}} \approx 0.7979
\end{align}\] and variance: \[\begin{align} \text{Var}(|o_{ij}|) = 1 - \frac{2}{\pi} \approx 0.3634
\end{align}\] Thus: \[\begin{align} \sum_{j\neq i} |o_{ij}| \approx m_i \cdot \sqrt{\frac{2}{\pi}}
\end{align}\]
\(\left|o_{ii} - \frac{\text{tr}(O)}{2^n}\right|\)
When \(o_{ii} \neq 0\), the diagonal element \(o_{ii}\) follows a normal distribution \(N(0, 1)\); given that \(\frac{\text{tr}(O)}{2^n}\) is small, the absolute deviation \(\left| o_{ii} - \frac{\text{tr}(O)}{2^n} \right|\) can be approximated by \(|o_{ii}|\), which, for
a standard normal variable, has an expected value of \(\sqrt{\frac{2}{\pi}}\). In contrast, when \(o_{ii} = 0\), this deviation simplifies to \(\left|
\frac{\text{tr}(O)}{2^n} \right|\), a quantity that remains very small due to the small magnitude of \(\frac{\text{tr}(O)}{2^n}\). Since the probability \(\frac{k}{2^n}\) (governing
\(o_{ii} \neq 0\)) is typically small in practice, the case \(o_{ii} = 0\) dominates across most rows. As a result, for the majority of rows, the sum \(S_i\)
(which reflects the relevant magnitude of elements in row \(i\)) can be approximated by the sum of absolute values of all non-zero off-diagonal elements in that row, i.e., \(S_i \approx \sum_{j:
o_{ij} \neq 0} |o_{ij}|\), where this sum consists of exactly \(k\) terms corresponding to the non-zero entries in the row.
Distribution of \(S_i\)
Approximately, the sum \(S_i \approx \sum_{j: o_{ij} \neq 0} |o_{ij}|\) can be characterized as the sum of \(k\) independent half-normal random variables, each arising from the absolute
value of a standard normal variable (consistent with the distribution of non-zero off-diagonal elements \(o_{ij}\)). For such a sum, the mean is given by \(k\sqrt{\frac{2}{\pi}}\), while the
variance equals \(k\left(1 - \frac{2}{\pi}\right)\). As \(k\) becomes large, the central limit theorem applies, leading to a useful approximation where \(S_i\) follows a normal distribution: specifically, \(S_i \approx N\left(k\sqrt{\frac{2}{\pi}}, k\left(1 - \frac{2}{\pi}\right)\right)\).
Next, we estimate the maximum value. \[\begin{align} \left\|O - \frac{\text{tr}(O)}{2^n}I\right\|_{\infty} = \max_{1\leq i\leq 2^n} S_i \end{align}\] This is the maximum of \(2^n\) approximately normal random variables. According to extreme value theory, for \(m = 2^n\) random variables from \(N(\mu, \sigma^2)\), the maximum \(M_m\) behaves as: \[\begin{align} M_m \approx \mu + \sigma\sqrt{2\log m} \end{align}\] where:
\(\mu = k\sqrt{\frac{2}{\pi}}\)
\(\sigma = \sqrt{k\left(1 - \frac{2}{\pi}\right)}\)
\(m = 2^n\), so \(\log m = n\log 2\)
Substituting, we get: \[\begin{align} \left\|O - \frac{\text{tr}(O)}{2^n}I\right\|_{\infty} \approx k\sqrt{\frac{2}{\pi}} + \sqrt{k\left(1 - \frac{2}{\pi}\right)}\cdot\sqrt{2n\log 2} \end{align}\]
This norm is approximately \(O(k + \sqrt{kn})\).When \(k\) is fixed and \(n\) increases, the norm grows as \(\sqrt{n}\); When \(n\) is fixed and \(k\) increases, the dominant term is \(k\).
This estimation assumes that \(k\) is sufficiently large for the normal approximation and that for large \(n\), the effect of \(\frac{\text{tr}(O)}{2^n}\)
is negligible. For very small \(k\) or \(n\), a more precise analysis may be needed, but this expression provides a reasonable scale for typical cases.
Finally, the total number of measurements \(T\) is \[\begin{align}
T = \left\lceil \frac{34 \cdot \text{max\_shadow\_norm} \cdot \left\lceil 2\log(2M/\delta) \right\rceil}{\varepsilon^2} \right\rceil
\end{align}\] Substituting: \[\begin{align}
T = \left\lceil \frac{34 \cdot \left[ k\sqrt{\frac{2}{\pi}} + \sqrt{k \left( 1 - \frac{2}{\pi} \right) \cdot 2n\log 2} \right]^2 \cdot \left\lceil 2\log(2M/\delta) \right\rceil}{\varepsilon^2} \right\rceil
\end{align}\] At this point, the part concerning the quantum resource estimation in Statement 3 is proven.
To estimate the number of floating-point operations in the expectation value calculation part, we analyze the floating-point operations in the core computational parts. The key components include:
Constructing Single-Qubit Density Matrices: The ground state projection step involves no floating-point operations, as it relies solely on direct assignment to map the system to the ground state configuration, avoiding any numerical
computation. In contrast, the basis transformation, which corresponds to the operation \(U^{\dagger}\rho U\), entails more involved numerical processing: specifically, it requires two \(2\times2\) complex matrix multiplications. Each of these matrix multiplications can be decomposed into real arithmetic operations, consisting of 8 real multiplications and 4 real additions, totaling 12 floating-point operations
(FLOPs) per multiplication. Scaling this to the system size, the total computational cost for the basis transformation thus amounts to 2 multiplications multiplied by 12 FLOPs, resulting in 24 FLOPs per qubit, a cost that accumulates linearly with the
number of qubits in the system.
Computing \(3\rho - I\): Scalar multiplication (applied three times in the process) involves performing a single floating-point operation (FLOP) on each of the 4 elements of the matrix, resulting in a total
of 4 FLOPs for this step; this operation scales the matrix by a scalar factor, adjusting the magnitude of its elements in a uniform manner. Following this, the matrix subtraction step (specifically subtracting the identity matrix \(I\)) requires 1 FLOP per element for the 4 elements involved, also contributing 4 FLOPs, this subtraction modifies each element of the matrix by removing the corresponding entry from the identity matrix, which is critical for
isolating the target components of the matrix. Combining these two steps, the total computational cost for the process amounts to 8 FLOPs per qubit, a cost that remains consistent across individual qubits and scales linearly with the number of qubits in
larger systems, making it a predictable and manageable component of the overall computational framework.
Kronecker Product Accumulation:At step \(j\), the input matrix involved in the computation has a size of \(2^j \times 2^j\), a dimension that grows exponentially with \(j\) and directly influences the computational complexity of subsequent operations. Correspondingly, the number of non-zero elements in this matrix is \(4^j\). For each Kronecker product
operation performed at this step, the computational cost is determined by the number of non-zero elements and the arithmetic required per element: specifically, each complex multiplication (a key component of the Kronecker product) entails 6 floating-point
operations (FLOPs), and with 4 such operations needed per non-zero element, the total FLOPs for the Kronecker product amount to \(4^j \times 4 \times 6 = 24 \times 4^j\).
Expectation Value Calculation (\(\text{Tr}(O\sigma_i)\)): The observable \(O\) contains \(2^n \times k\) non-zero elements, a count that
reflects the combined influence of the system size (governed by \(n\), the number of qubits) and the sparsity structure (determined by \(k\), the average number of non-zeros per row). For
each of these non-zero elements, two key operations are performed: a complex multiplication, which involves 6 floating-point operations (FLOPs) due to the need to handle both real and imaginary components separately, and a subsequent addition, which
contributes 1 FLOP to accumulate the result into the overall sum. When aggregated across all non-zero elements, these operations yield a total computational cost of \(7 \times 2^n k\) FLOPs per observable.
Total Floating-Point Operations Expression: For \(T\) samples and \(M\) observables, the calculation is: \[\begin{align}
\text{FLOPs} = T \left[ \underbrace{8(4^n - 1)}_{\text{Kronecker product}} + \underbrace{24n+8n}_{\text{single-qubit operations}} + \underbrace{7M \cdot 2^n k}_{\text{expectation value calculation}} \right]
\end{align}\] We already know that \(T = \left\lceil \frac{34 \cdot \left[ k\sqrt{\frac{2}{\pi}} + \sqrt{k \left( 1 - \frac{2}{\pi} \right) \cdot 2n\log 2} \right]^2 \cdot \left\lceil 2\ln(2M/\delta)
\right\rceil}{\varepsilon^2} \right\rceil\)
Substituting \(T\) into the above equation yields \[\begin{align}
\text{Total FLOPs} = \left\lceil \frac{34}{\varepsilon^2} \left[ k\sqrt{\frac{2}{\pi}} + \sqrt{2kn\log 2 \left( 1 - \frac{2}{\pi} \right)} \right]^2 \cdot \left\lceil 2\log(2M/\delta) \right\rceil \right\rceil \times \left[ 8(4^n - 1) + 32n + 7M \cdot 2^n
k \right]
\end{align}\] At this point, the part concerning the classical resource estimation in Statement 3 is proven.
Since \(O_m\) is a \(2^n \times 2^n\) Hermitian matrix, direct measurement implies independently executing the measurement process for each \(O_m\).
Statistical Requirements for Measurement Times: The overarching objective is to estimate the expectation values of all \(M\) observables with a specified additive error \(\epsilon\) and a failure probability \(\delta\), ensuring that the computed estimates remain within \(\epsilon\) of the true expectation values for all observables with a confidence level of \(1 - \delta\). To contextualize this goal, it is important to note the nature of the measurement results: each individual measurement of an observable \(O_m\) yields one of its eigenvalues, which is bounded within the interval \([-\|O_m\|, \|O_m\|]\). Here, \(\|O_m\|\) denotes the spectral norm of \(O_m\), defined as the largest absolute value among its eigenvalues, a quantity that characterizes the maximum possible magnitude of the observable’s measurement outcomes and thus plays a critical role in determining the precision requirements for the estimation process.
According to the Hoeffding inequality, for a random variable \(X\) (measurement result) with values in \([a, b]\), to ensure that the probability of the sample mean deviating from the
true expectation by less than \(\epsilon\) is at least \(1 - \delta'\) (where \(\delta'\) is the failure probability for a single observable), the
required number of samples \(N\) is: \[\begin{align}
N \geq \frac{(b - a)^2}{2\epsilon^2} \ln\left(\frac{2}{\delta'}\right)
\end{align}\] For \(O_m\): \(a = -\|O_m\|\), \(b = \|O_m\|\), so \(b - a = 2\|O_m\|\).
Substituting: \[\begin{align}
N_m = \left\lceil\frac{(2\|O_m\|)^2}{2\epsilon^2} \ln\left(\frac{2}{\delta'}\right)\right\rceil = \left\lceil\frac{2\|O_m\|^2}{\epsilon^2} \ln\left(\frac{2}{\delta'}\right)\right\rceil
\end{align}\]
Adjustment for Multiple Observables: Since there are \(M\) observables, we want the probability that all \(M\) estimates simultaneously satisfy the error \(\epsilon\) to be at least \(1 - \delta\). According to the union bound, setting the failure probability for each observable to \(\delta/M\) ensures the total failure probability does not exceed \(\delta\). Therefore: \(\delta' = \delta/M\). \(\ln\left(\frac{2}{\delta'}\right) = \ln\left(\frac{2M}{\delta}\right)\). The number of measurements for each observable \(O_m\): \[\begin{align} N_m = \left\lceil\frac{2\|O_m\|^2}{\epsilon^2} \ln\left(\frac{2M}{\delta}\right)\right\rceil \end{align}\]
Total Number of Measurements: Assuming that the \(M\) observables do not share measurements (i.e., not considering whether the observables can be diagonalized simultaneously), the total number of measurements is: \[\begin{align} N_{\text{total}} = \sum_{m = 1}^{M} N_m = \sum_{m = 1}^{M} \left\lceil\frac{2\|O_m\|^2}{\epsilon^2} \ln\left(\frac{2M}{\delta}\right)\right\rceil \end{align}\]
Calculation of the Spectral Norm \(\|O_m\|\): \(O_m\) is a sparse Hermitian matrix where each row contains \(k\) non - zero elements, and
these non - zero entries follow the standard normal distribution \(\mathcal{N}(0, 1)\). For such matrices, to determine the spectral norm \(\| O_m \|\) (the maximum absolute eigenvalue,
crucial for bounding measurement outcomes as discussed prior), numerical techniques like \(\texttt{scipy.sparse.linalg.eigsh}\) are employed to compute the largest and smallest eigenvalues of each \(O_m\); the maximum absolute value among these eigenvalues is then taken as \(\| O_m \|\). However, directly computing this spectral norm (equivalent to the largest singular value for Hermitian
matrices) is computationally burdensome, especially for high - dimensional sparse matrices. Thus, an upper - bound approximation is utilized: \(\| O_m \| \leq \sqrt{\| O_m \|_1 \cdot \| O_m \|_\infty}\), where \(\| O_m \|_1\) denotes the matrix 1 - norm (the maximum absolute column sum) and \(\| O_m \|_\infty\) is the matrix infinity - norm.
For Hermitian matrices, \(\| O_m \|_1 = \| O_m \|_\infty\), because the row and column sums are symmetric. We can calculate the 1-norm (\(\text{norm}_1\)) and the infinity norm (\(\text{norm}_{\infty}\)) of the matrix through a program, and then take: \[\begin{align}
\text{spectral\_bound} = \sqrt{\text{norm\_1} \cdot \text{norm\_inf}}
\end{align}\] and substitutes this upper bound into the calculation formula for \(N_m\).
Analysis of relationship Between Spectral Norm and \(k\): We first conduct an analysis of the variance of a matrix as the initial step of our research or calculation process.
Diagonal: Real numbers drawn from the standard normal distribution \(\mathcal{N}(0, 1)\). \(O_{ii} \sim \mathcal{N}(0, 1)\), so the variance \(\mathbb{E}[O_{ii}^2] = 1\).
Off-diagonal: \(O_{ij} = a + bi \;(i \neq j)\), where \(a, b \sim \mathcal{N}(0, 1)\) are independent. Then: \[\begin{align} \mathbb{E}[|O_{ij}|^2] = \mathbb{E}[a^2 + b^2] = \mathbb{E}[a^2] + \mathbb{E}[b^2] = 1 + 1 = 2 \end{align}\]
For a sparse random matrix, the spectral norm \(\|O_m\|\) mainly depends on the off - diagonal elements. Because when \(k\) is large, the contribution of the diagonal elements can be neglected (their variance is 1, while the variance of the off - diagonal elements is 2, and the quantity is \(k - 1\approx k\)). On the whole, the average variance of each non - zero element is:
\[\begin{align}
\text{Average variance} = \frac{1\cdot\mathbb{E}[O_{ii}^2]+(k - 1)\cdot\mathbb{E}[|O_{ij}|^2]}{k}=\frac{1\cdot1+(k - 1)\cdot2}{k}=2-\frac{1}{k}\approx 2\quad\text{when}\;k\gg 1
\end{align}\] Let \(v^2 = 2\) represent the variance of the off - diagonal elements.
Following the initial step, we perform the estimation of the spectral norm in Random Matrix Theory as the second part of our methodology.
In random matrix theory, for a sparse Hermitian matrix, each row has \(k\) non - zero elements, the element mean is 0, and the variance is \(v^2\). The asymptotic behavior of the spectral
norm \(\|O_m\|\) is given by the following formula(this formula can be found in [19]):
\[\begin{align}
\|O_m\|\approx 2v\sqrt{d}
\end{align}\] where \(d\) is the average degree of the graph (that is, the number of off - diagonal elements connected to each vertex). Here:
The adjacency graph of the matrix is a random graph, where each row contains \(k\) non - zero elements (encompassing the diagonal element). Specifically, the diagonal elements correspond to self - loops within the graph
structure, while the off - diagonal elements represent edges between different vertices. Given that each row has \(k\) non - zero entries, with one being the diagonal element and \(k - 1\)
being off - diagonal, the degree of each vertex (accounting for self - loops) is \(k\). However, when considering only off - diagonal connections (as self - loops do not contribute to the edges between distinct vertices),
the average degree of these off - diagonal edges is \(d = k - 1\).
For large \(n\) (matrix dimension \(N = 2^n\rightarrow\infty\)) and \(k\) that is fixed or grows moderately, the spectral norm is mainly dominated by the off
- diagonal part. Standard results show that:
\[\begin{align} \|O_m\|\approx 2v\sqrt{d},\quad\text{where}\;d = k - 1 \end{align}\] Substitute the variance \(v^2 = 2\):
\[\begin{align} \|O_m\|\approx 2\sqrt{2}\sqrt{k - 1} \end{align}\] When \(k\gg 1\), \(\sqrt{k - 1}\approx\sqrt{k}\). Therefore:
\[\begin{align}
\|O_m\|\approx 2\sqrt{2}\sqrt{k}
\end{align}\]
Substitute into the Measurement Count Formula:
Substitute \(\| O_m \| \approx 2\sqrt{2}\sqrt{k}\) into \(N_m\): \[\begin{align}
N_m \approx \frac{2(2\sqrt{2}\sqrt{k})^2 \ln\left( \frac{2M}{\delta} \right)}{\epsilon^2} = \frac{16k \ln\left( \frac{2M}{\delta} \right)}{\epsilon^2}
\end{align}\] Assuming that the spectral norms of all \(M\) observables are approximately equal, the total number of measurements is: \[\begin{align}
N_{\text{total}} \approx M \cdot \frac{16k \ln\left( \frac{2M}{\delta} \right)}{\epsilon^2}
\end{align}\] With this, the proof of Statement 4 is complete.
We will start by introducing some fundamental concepts.
Hermitian Matrix: A matrix \(A\) is Hermitian if it satisfies \(A = A^{\dagger}\) (i.e., it is equal to its conjugate transpose). This means that for the matrix
elements, \(a_{ij} = a_{ji}^*\), where \(*\) denotes complex conjugation. Hermitian matrices have real eigenvalues and are often used to represent observables in quantum mechanics.
Pauli Matrices: Pauli matrices are a set of \(2 \times 2\) Hermitian and unitary matrices commonly used to describe quantum systems. They include:
\(\sigma_x = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}\), \(\sigma_y = \begin{bmatrix}0 & -i \\ i & 0\end{bmatrix}\), \(\sigma_z = \begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix}\), The identity matrix \(I = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}\) (although not strictly a Pauli matrix, it is often included in decompositions).
In multi-qubit systems, higher-dimensional matrices are constructed through the tensor products of these matrices.
Decomposition Goal: We need to prove that any \(2^n \times 2^n\) Hermitian matrix \(A\) (corresponding to \(n\) qubits) can be decomposed into a linear combination of tensor products of Pauli matrices: \[\begin{align} A = \sum_{\mathbf{k}} c_{\mathbf{k}} (P_{k_1} \otimes P_{k_2} \otimes \cdots \otimes P_{k_n}) \end{align}\] where \(P_{k_i} \in \{I, \sigma_x, \sigma_y, \sigma_z\}\), and \(c_{\mathbf{k}}\) are real coefficients.
Next, we will now provide a detailed and rigorous proof of decomposability. To prove this, we need to show that the tensor products of Pauli matrices form a complete basis for the space of \(2^n \times 2^n\) Hermitian
matrices, meaning any Hermitian matrix can be expressed as their real linear combination. The proof is divided into the following five steps:
The first step involves basis construction. For \(n\) qubits, the Hilbert space dimension is \(2^n\), and the matrix dimension is \(2^n \times 2^n\). The set
of tensor products of Pauli matrices is defined as: \[\begin{align} \{P_{k_1} \otimes P_{k_2} \otimes \cdots \otimes P_{k_n} \mid P_{k_i} \in \{I, \sigma_x, \sigma_y, \sigma_z\}\}
\end{align}\] Each position has 4 choices (\(I, \sigma_x, \sigma_y, \sigma_z\)), and there are \(n\) positions, so there are a total of: \(4^n =
(2^2)^n\) tensor product matrices.
Step 2 is to prove linear independence. These tensor product matrices are linearly independent, which can be proven through their trace orthogonality. For two different tensor products \(P_{\mathbf{k}} = P_{k_1} \otimes \cdots
\otimes P_{k_n}\) and \(P_{\mathbf{m}} = P_{m_1} \otimes \cdots \otimes P_{m_n}\): \[\begin{align}
\text{tr}(P_{\mathbf{k}}^{\dagger} P_{\mathbf{m}}) = \text{tr}(P_{k_1}^{\dagger} P_{m_1}) \cdot \text{tr}(P_{k_2}^{\dagger} P_{m_2}) \cdots \text{tr}(P_{k_n}^{\dagger} P_{m_n})
\end{align}\] The trace properties of single Pauli matrices are:
\(\text{tr}(I) = 2\), \(\text{tr}(\sigma_x) = \text{tr}(\sigma_y) = \text{tr}(\sigma_z) = 0\), \(\text{tr}(I \cdot I) = 2\), \(\text{tr}(\sigma_i \cdot \sigma_j) = 2\delta_{ij}\) for \(i, j = x, y, z\), \(\text{tr}(I \cdot \sigma_i) = 0\)
Therefore: \[\begin{align}
\text{tr}(P_{\mathbf{k}}^{\dagger} P_{\mathbf{m}}) = 2^n \delta_{\mathbf{k}\mathbf{m}}
\end{align}\] That is, the trace is \(2^n\) only when \(\mathbf{k} = \mathbf{m}\) (all \(k_i = m_i\)); otherwise, it is 0. This proves linear
independence.
The third step is dedicated to spanning the entire matrix space. The space of \(2^n \times 2^n\) complex matrices is characterized by a dimension of \((2^n)^2 = 4^n\), a result derived from
the fact that each entry in such a matrix can be an independent complex number, leading to a total of \(2^n \times 2^n\) free parameters that define the matrix. Crucially, the set of tensor product matrices under
consideration contains exactly \(4^n\) linearly independent elements, a count that directly matches the dimension of the matrix space. This correspondence is not coincidental: in linear algebra, a set of linearly
independent vectors (or matrices, viewed as vectors in a higher-dimensional space) spans the entire space if their number equals the space’s dimension. Thus, the tensor product matrices, by virtue of having \(4^n\) linearly
independent elements, are sufficient to generate every possible \(2^n \times 2^n\) complex matrix through linear combinations. n leads to exponentially large matrix spaces that require structured bases for tractable
manipulation.
Next step we use the hermiticity constraint. Each tensor product \(P_{\mathbf{k}} = P_{k_1} \otimes \cdots \otimes P_{k_n}\) satisfies the Hermiticity constraint, meaning \(P_{\mathbf{k}}^{\dagger}
= P_{\mathbf{k}}\). This property arises from the fundamental Hermiticity of its constituent factors: each \(P_{k_i}\) is either the identity matrix \(I\) or one of the Pauli
matrices, all of which are Hermitian (i.e., \(P_{k_i}^{\dagger} = P_{k_i}\)). For tensor products, the adjoint operation distributes across the tensor product such that \((P_{k_1} \otimes \cdots
\otimes P_{k_n})^{\dagger} = P_{k_1}^{\dagger} \otimes \cdots \otimes P_{k_n}^{\dagger}\); substituting the Hermiticity of each factor confirms that the entire tensor product matrix is Hermitian. Turning to the space of \(2^n \times 2^n\) Hermitian matrices, its dimension can be derived by analyzing the degrees of freedom in such matrices: diagonal elements must be real numbers (contributing \(2^n\) independent
parameters), while off-diagonal elements exhibit conjugate symmetry (i.e., \(M_{ij} = \overline{M_{ji}}\) for \(i \neq j\)), reducing the independent parameters to \(\frac{2^n(2^n - 1)}{2}\) complex numbers (each contributing 2 real degrees of freedom). Summing these contributions gives a total of \(2^n + 2 \cdot \frac{2^n(2^n - 1)}{2} = (2^n)^2 = 4^n\)
degrees of freedom, matching the dimension of the full space of \(2^n \times 2^n\) complex matrices.
Then comes the final step. Given that the basis matrices \(P_{\mathbf{k}}\) are Hermitian, any \(2^n \times 2^n\) Hermitian matrix \(A\) can be expressed as
a linear combination of these basis matrices, specifically in the form \(A = \sum_{\mathbf{k}} c_{\mathbf{k}} P_{\mathbf{k}}\), where \(c_{\mathbf{k}}\) denote the coefficients of the
combination. To ensure the Hermiticity of \(A\), we examine its adjoint: \(A^{\dagger} = \sum_{\mathbf{k}} c_{\mathbf{k}}^* P_{\mathbf{k}}^{\dagger}\). Since each \(P_{\mathbf{k}}\) is Hermitian, \(P_{\mathbf{k}}^{\dagger} = P_{\mathbf{k}}\), simplifying the expression for \(A^{\dagger}\) to \(\sum_{\mathbf{k}} c_{\mathbf{k}}^* P_{\mathbf{k}}\). For \(A\) to be Hermitian, it must satisfy \(A = A^{\dagger}\), which implies \(\sum_{\mathbf{k}} c_{\mathbf{k}} P_{\mathbf{k}} = \sum_{\mathbf{k}} c_{\mathbf{k}}^* P_{\mathbf{k}}\). Given the linear independence of the basis matrices \(P_{\mathbf{k}}\), the coefficients of
corresponding terms in the sums must be equal, leading to the condition \(c_{\mathbf{k}} = c_{\mathbf{k}}^*\) for all \(\mathbf{k}\). This equality indicates that each coefficient \(c_{\mathbf{k}}\) must be a real number, i.e., \(c_{\mathbf{k}} \in \mathbb{R}\).
To summarize, The \(4^n\) Hermitian and linearly independent Pauli tensor product matrices span the entire space of \(2^n \times 2^n\) Hermitian matrices. Therefore, any \(2^n \times 2^n\) Hermitian matrix \(A\) can be decomposed as: \[\begin{align}
A = \sum_{\mathbf{k}} c_{\mathbf{k}} P_{k_1} \otimes \cdots \otimes P_{k_n}
\end{align}\] where \(c_{\mathbf{k}}\) are real numbers.
Finally, we present the specific method for decomposition. Given a \(2^n \times 2^n\) Hermitian matrix \(A\), we can compute its decomposition coefficients using trace orthogonality.
The first step is coefficient calculation. The specific steps are as follows:
For each basis matrix \(P_{\mathbf{k}} = P_{k_1} \otimes \cdots \otimes P_{k_n}\), the coefficient \(c_{\mathbf{k}}\) is: \[\begin{align} c_{\mathbf{k}} = \frac{1}{2^n} \text{tr}(A \cdot P_{\mathbf{k}}) \end{align}\]
Reason: \[\begin{align} \text{tr}(P_{\mathbf{k}}^{\dagger} P_{\mathbf{m}}) = 2^n \delta_{\mathbf{k}\mathbf{m}} \end{align}\] If \(A = \sum_{\mathbf{m}} c_{\mathbf{m}} P_{\mathbf{m}}\), then: \[\begin{align} \text{tr}(A \cdot P_{\mathbf{k}}) = \text{tr}\left(\sum_{\mathbf{m}} c_{\mathbf{m}} P_{\mathbf{m}} \cdot P_{\mathbf{k}}\right) = \sum_{\mathbf{m}} c_{\mathbf{m}} \text{tr}(P_{\mathbf{m}} \cdot P_{\mathbf{k}}) = c_{\mathbf{k}} \cdot 2^n \end{align}\] \[\begin{align} c_{\mathbf{k}} = \frac{1}{2^n} \text{tr}(A \cdot P_{\mathbf{k}}) \end{align}\] \(\text{tr}(A \cdot P_{\mathbf{k}})\) is real (since both \(A\) and \(P_{\mathbf{k}}\) are Hermitian), ensuring \(c_{\mathbf{k}} \in \mathbb{R}\).
The second step is constructing the decomposition. We only need to substitute all coefficients: \[\begin{align} A = \sum_{\mathbf{k}} \left(\frac{1}{2^n} \text{tr}(A \cdot P_{\mathbf{k}})\right) P_{\mathbf{k}} \end{align}\] Iterate over all \(\mathbf{k} \in \{0, 1, 2, 3\}^n\) (a total of \(4^n\) terms) to obtain the complete decomposition.