Coarse scrambling for Sobol’ and Niederreiter sequences 2


Abstract

We introduce coarse scrambling, a novel randomization for digital sequences that permutes blocks of digits in a mixed-radix representation. This construction is designed to preserve the powerful \((0,\boldsymbol{e},d)\)-sequence property of the underlying points. For sufficiently smooth integrands, we prove that this method achieves the canonical \(O(n^{-3+\epsilon})\) variance decay rate, matching that of standard Owen’s scrambling. Crucially, we show that its maximal gain coefficient grows only logarithmically with dimension, \(O(\log d)\), thus providing theoretical robustness against the curse of dimensionality affecting scrambled Sobol’ sequences. Numerical experiments validate these findings and illustrate a practical trade-off: while Owen’s scrambling is superior for integrands sensitive to low-dimensional projections, coarse scrambling is competitive for functions with low effective truncation dimension.

Keywords: Quasi-Monte Carlo, scrambling, digital net, gain coefficient, Sobol’ sequence

AMS subject classifications: 65C05, 65D30

1 Introduction↩︎

We consider numerical approximation of \(d\)-dimensional integrals of the form \[\label{eq:target95integral} I(f) := \int_{[0,1)^d} f(\boldsymbol{x})\,d\boldsymbol{x},\tag{1}\] for Riemann-integrable functions \(f:[0,1)^d\to\mathbb{R}\). A common practical estimator is the equally weighted quadrature rule \[\label{eq:qmc-estimator} I(f;P_n) := \frac{1}{n}\sum_{i=0}^{n-1} f(\boldsymbol{x}_i),\tag{2}\] where \(P_n=\{\boldsymbol{x}_0,\dots,\boldsymbol{x}_{n-1}\}\subset[0,1)^d\) is a collection of sample points. When the \(\boldsymbol{x}_i\) are independent uniform samples, 2 reduces to the classical Monte Carlo (MC) estimator which is unbiased and has variance \[\label{eq:MCvar} \operatorname{Var}(I(f;P_n)) = \frac{\operatorname{Var}{f}}{n}, \qquad \text{where } \operatorname{Var}f := I(f^2) - I(f)^2,\tag{3}\] for \(f\in L^2([0,1)^d)\). Despite its robustness, the root mean square error of the MC estimator converges at the relatively slow rate of \(O(n^{-1/2})\), motivating the use of more structured point sets.

Quasi-Monte Carlo (QMC) methods replace independent sampling by carefully constructed low-discrepancy point sets such as lattice rules [1], [2], Halton sequences [3], and digital nets [4]; [5]. For integrands with bounded variation, these constructions often achieve much faster error decay than MC as \(O(n^{-1+\epsilon})\). A key drawback of deterministic QMC, however, is the lack of a practical and reliable error estimate. Randomized QMC (RQMC) resolves this issue by introducing a randomization that preserves the low-discrepancy structure of the points while restoring the benefits of a stochastic method: the estimator becomes unbiased, and its variance can be estimated from independent replications, thus providing practical error bars.

Among randomized QMC schemes, Owen’s scrambling [6], [7] has become a standard method for digital nets. In Owen’s scrambling each digit of an input point’s base-\(b\) expansion is permuted by an independently chosen random permutation (possibly dependent on higher-order digits). For a set of \(n\) points \(P_n\), Owen proved a bound of the form \[\label{eq:gain-intro2} \operatorname{Var}\bigl(I(f;\widetilde{P}_n)\bigr)\le \frac{\Gamma(P_n)}{n}\operatorname{Var}(f),\tag{4}\] where \(\Gamma(P_n)\) is the maximal gain coefficient associated with the point set. Moreover, for sufficiently smooth integrands the variance decays substantially faster as \(O(n^{-3 + \epsilon})\) (see [8], [9]), explaining the large practical gains of scrambling over plain MC. While recent work has also focused on alternative estimators, such as the median-of-means, to achieve robust and almost optimal order of convergence without a knowledge of integrands [10][13], the focus of this paper is on the variance properties of the classical mean-based estimator.

In practice, scrambled Sobol’ sequences [14][16] are arguably the most widely used in scientific computing and industry due to their excellent empirical performance. A theoretical drawback, however, is that their maximal gain coefficient \(\Gamma(P_n)\) can grow exponentially with dimension \(d\) [17], [18]. In very high dimensions, this can lead to a situation where scrambled QMC performs substantially worse than plain Monte Carlo, undermining its primary motivation.

Some alternative RQMC constructions yield smaller maximal gain coefficients. For example, scrambled Faure sequences in prime power base \(b\) (with \(d \leq b\)) satisfy \(\Gamma \leq e \approx 2.718\) [7], but they are generally not considered efficient. One reason is that, numerically, the benefits of scrambling appear only when \(n \gtrsim b^{d}\). Other drawbacks are that large bases are required in high dimensions, and arithmetic in base \(b\) is slower than in base \(2\). Scrambled Halton sequences also have small gain coefficients, growing only as \(O(\log d)\) [19], but they typically do not achieve the higher-order variance decay observed for scrambled digital nets.

This paper introduces coarse scrambling, a randomization technique for digital sequences designed to operate on blocks of digits rather than individual digits. The motivation for this block-wise approach stems directly from our primary goal: to preserve the powerful \((0,\boldsymbol{e},d)\)-sequence property of Sobol’ and Niederreiter sequences. Since this property is defined over mixed bases of the form \((b^{e_1},\dots,b^{e_d})\), scrambling entire blocks of digits is the natural and necessary way to maintain this structure—a feat not achieved by standard digit-wise scrambling. By preserving this crucial property, our construction achieves two critical objectives: it controls the maximal gain coefficient to grow only logarithmically with dimension, \(O(\log d)\), while simultaneously retaining the \(O(n^{-3+\epsilon})\) convergence rate for smooth functions.

Our main contributions are as follows:

  1. A Unifying Framework and Structure-Preserving Randomization: We introduce the concept of an equidistributed sequence in a mixed base, a novel theoretical framework that, for the first time, provides a unified treatment of both Halton-type sequences and digital \((0,\boldsymbol{e},d)\)-sequences like Sobol’s. Within this framework, we formally define coarse scrambling and prove it preserves the powerful \((0,\boldsymbol{e},d)\)-structure, a key property for high-performance QMC integration.

  2. Theoretical Performance Guarantees: We derive an analytical expression for the gain coefficients of coarsely scrambled sequences by extending recent results for scrambled Halton sequences [19]. Our analysis reveals two key properties: (i) for sufficiently smooth integrands, the variance converges at the canonical \(O(n^{-3+\epsilon})\) rate, matching the performance of Owen’s scrambling, and (ii) the maximal gain coefficient grows only logarithmically with dimension, \(O(\log d)\), ensuring robustness in high-dimensional settings.

  3. Empirical Validation of the Theoretical Trade-offs: We conduct numerical experiments to validate our theoretical findings and illustrate the practical trade-off between coarse and usual scrambling. The results highlight that the superior accuracy of usual scrambling is driven by its excellent performance on low-dimensional projections. Conversely, for functions with low effective dimensionality, where the total variance is concentrated in the first few coordinates, the performance of coarse scrambling is comparable.

The remainder of the paper is organized as follows. Section 2 recalls basic definitions and notation for digital nets, scrambling, and gain coefficients. In Section 3 we develop the theoretical analysis of gain coefficients for coarse scrambling. Section 4 compares the usual (digitwise) scrambling and coarse theoretically. Section 5 reports numerical experiments.

2 Preliminaries↩︎

We use the following notation. Let \(\mathbb{N}\) be the set of positive integers, \(\mathbb{N}_0 := \mathbb{N}\cup \{0\}\), and \(\mathbb{Z}_b:= \{0,1,\dots, b-1\}\). Let \(\chi(A)\) denote the characteristic function of the event \(A\). Let \(1{:}d := \{1, \dots, d\}\). For \(u \subseteq 1{:}d\), we denote the cardinality of \(u\) by \(|u|\).

2.1 Digital Constructions and Equidistribution↩︎

Digital constructions in prime power base \(b\) were introduced by Niederreiter. We adopt the definition given in [5]. Let \(\mathbb{F}_b\) be a \(b\)-element field and fix a bijection \(\phi \colon \mathbb{Z}_b\to \mathbb{F}_b\) with \(\phi(0) = 0\). If \(b\) is a prime, we usually identify \(\mathbb{F}_b\) with \(\mathbb{Z}_b\) with addition, subtraction, and multiplication defined modulo \(b\).

Definition 1. Let \(b\) be a prime power and \(C_1, \dots, C_d \in \mathbb{F}_b^{\infty \times \infty}\). For an integer \(k \ge 0\), denote its \(b\)-adic expansion by \(k = \sum_{i=0}^{\infty} \kappa_i b^{i}\) with \(\kappa_i \in \mathbb{Z}_b\) and let \(\vec{k} = (\phi(\kappa_0), \phi(\kappa_1), \dots)^\top\). Then the \(k\)-th point \(\boldsymbol{x}_k\in [0,1)^d\) is given by \[\begin{align} \label{eq:pt-generation} \boldsymbol{x}_k := (\psi(C_1\vec{k}), \dots, \psi(C_d\vec{k})), \end{align}\qquad{(1)}\] where the map \(\psi \colon \mathbb{F}_b^\infty \to [0,1]\) is defined by \[\label{eq:vec2val} \psi((y_1,y_2, \dots)^\top) := \sum_{i=1}^\infty \frac{\phi^{-1}(y_{i})}{b^i}.\qquad{(2)}\] The sequence of points \(\{\boldsymbol{x}_0,\boldsymbol{x}_1 \dots\}\) constructed this way is called a digital sequence* in base \(b\) with generating matrices \(C_1,\dots,C_d\).*

To assess the quality of point sets and sequences, the notions of \((t,m,d)\)-nets and \((t,d)\)-sequences are widely used to establish their equidistribution property. To introduce these objects, we first define elementary intervals.

Definition 2. Let \(b_j, k_j \in \mathbb{N}_0\) with \(b_j \ge 2\) for \(1 \le j \le d\). The elementary \((k_1, \dots, k_d)\)-interval in base \((b_1, \dots, b_d)\)* is an interval of the form \[\prod_{j=1}^d \left[\frac{a_j}{b_j^{k_j}}, \frac{a_j+1}{b_j^{k_j}}\right) \quad \text{with a_j \in \mathbb{N}_0,\, 0 \le a_j < b_j^{k_j}}.\]*

Definition 3. Let \(m,b,t \in \mathbb{N}_0\) with \(m \ge 1\) and \(b \ge 2\). Let \(P = \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots, \boldsymbol{x}_{b^m-1}\} \subset [0,1)^d\) be a set of \(b^m\) points. We call \(P\) a \((t,m,d)\)-net in base \(b\)* if every elementary \((k_1, \dots, k_d)\)-interval in base \((b,\dots,b)\) with \(k_1 + \cdots + k_d \le m-t\) contains exactly \(b^t\) points of \(P\). A sequence \(\mathcal{S}= \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots\} \subset [0,1)^d\) is called a \((t,d)\)-sequence in base \(b\) if, for any \(m,r \in \mathbb{N}_0\), the point set \(\{\boldsymbol{x}_{rb^m}, \dots, \boldsymbol{x}_{(r+1)b^m-1}\}\) is a \((t,m,d)\)-net.*

2.2 Generalized Niederreiter Sequences↩︎

A well-known and important class of digital sequences is given by the Sobol’ and (generalized) Niederreiter sequences. The Sobol’ sequence, defined in [14], can be regarded as a generalized Niederreiter sequence in base \(2\) whose base polynomials are chosen to be primitive (see [20] for details). The Niederreiter sequence was introduced in [21] and later fully generalized by Tezuka in [22].

Definition 4. Let \(b\) be a prime power and let the base polynomial \(p_1(x), p_2(x), \dots, p_d(x)\) be monic polynomials in \(\mathbb{F}_b[x]\) which are pair-wise coprime. Let \(e_j := \deg(p_j)\). For \(k \in \mathbb{N}\) and \(1 \le j \le d\), let \(k_j = \lfloor (k-1)/e_j \rfloor\) and let \(y_{k}^{(j)}(x) \in \mathbb{F}_b[x]\) with the restriction that the residue polynomials \(y_{k}^{(j)}(x) \bmod p_j(x)\) for \((t-1)e_j \le k-1 < te_j\) are linearly independent over \(\mathbb{F}_b\). Consider the Laurent series expansion \[\label{eq:laurent} \frac{y_{k}^{(j)}(x)}{(p_j(x))^t} = \sum_{r=1}^{\infty}\frac{a^{(j)}(t,k,r)}{x^r}\in \mathbb{F}_b((x^{-1})).\qquad{(3)}\] The generalized Niederreiter sequence* is defined as a digital sequence whose generating matrices are \(C_j=(c_{k,r}^{(j)})_{k,r \in \mathbb{N}}\) for \(1 \le j \le d\), where each element is given by \[\label{eq:general-Nied-seq} c_{k,r}^{(j)} = a^{(j)}\left(k_j+1, k, r\right).\tag{5}\] *

Definition 5. We use the term full Niederreiter sequence* to refer to the generalized Niederreiter sequence in which the base polynomials \(p_1(x), p_2(x), \dots\) are taken to be all the monic irreducible polynomials over \(\mathbb{F}_b\), ordered by increasing degree.*

Generalized Niederreiter sequences are important examples of \((t,d)\)-sequences, as stated in the following theorem [5].

Theorem 1. Any generalized Niederreiter sequence in prime power base \(b\) with the base polynomials \(p_1(x),\dots,p_d(x)\) is a \((t,d)\)-sequence with \(t = \sum_{j=1}^d (\deg(p_j) - 1)\).

As the theorem indicates, the quality of the generalized Niederreiter sequence depends on the degrees of the base polynomials. To achieve a small \(t\)-value, one must choose base polynomials with low degrees. Table 1 summarizes the number of available irreducible and primitive polynomials over \(\mathbb{F}_2\) for a given degree. For example, when constructing a Sobol’ sequence (where primitive polynomials are typically used), if one wishes to keep the degrees of the base polynomials at most \(7\), the maximum possible dimension is \(d = 1 + \sum_{n=1}^{7} P_2(n) = 37\), where the first dimension is specially chosen with \(p_1(x) = x\).

Table 1: Number of monic irreducible and primitive polynomials over \(\mathbb{F}_2\).
\(n\) 1 2 3 4 5 6 7 8 9 10 11 12 13 14
irreducible 2 1 2 3 6 9 18 30 56 99 186 335 630 1161
primitive 1 1 2 2 6 6 18 16 48 60 176 144 630 756

2.3 Stronger Equidistribution and Further Generalization↩︎

A smaller \(t\)-value is preferable, since it indicates that the point set satisfies an equidistribution property with respect to finer partitions of \([0,1)^d\). However, the ideal case \(t=0\) is unattainable in high dimensions. For instance, if \(b\) is prime, a \((0,b+1)\)-sequence in base \(b\) does not exist. To overcome this limitation, generalized notions of \((t,\boldsymbol{e},m,d)\)-nets and \((t,\boldsymbol{e},d)\)-sequences were introduced [23]. In these notions, the class of elementary intervals under consideration is restricted, which in turn allows for smaller \(t\)-values.

Definition 6. Let \(m,b,t \in \mathbb{N}_0\) with \(m \ge 1\) and \(b \ge 2\) and \(e_j \in \mathbb{N}\) for \(1 \le j \le d\). Let \(P = \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots, \boldsymbol{x}_{b^m-1}\} \subset [0,1)^d\) be a set of \(b^m\) points. We call \(P\) a \((t,\boldsymbol{e},m,d)\)-net in base \(b\)* if every elementary \((k_1, \dots, k_d)\)-interval in base \((b^{e_1},\dots,b^{e_d})\) with \(e_1k_1 + \cdots + e_dk_d \le m-t\) contains exactly \(b^t\) points of \(P\). A sequence \(\mathcal{S}= \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots\} \subset [0,1)^d\) is called a \((t,\boldsymbol{e},d)\)-sequence in base \(b\) if, for any \(m,r \in \mathbb{N}_0\), the point set \(\{\boldsymbol{x}_{rb^m}, \dots, \boldsymbol{x}_{(r+1)b^m-1}\}\) is a \((t,\boldsymbol{e},m,d)\)-net.*

Under this generalized framework, generalized Niederreiter sequences exhibit an ideal equidistribution property, as shown by Tezuka [23].

Theorem 2. Let \(\mathcal{S}\) be a generalized Niederreiter sequence in prime power base \(b\) with the base polynomials \(p_1(x),\dots,p_d(x)\). Let \(\boldsymbol{e}= (\deg(p_1), \dots, \deg(p_d))\). Then \(\mathcal{S}\) is a \((0,\boldsymbol{e},d)\)-sequence in base \(b\).

In this paper, we further generalize the concept of \((0,\boldsymbol{e},d)\)-sequences to arbitrary bases.

Definition 7. Let \(\mathcal{S}= \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots\} \subset [0,1)^d\) be a sequence of points. We call \(\mathcal{S}\) an equidistributed sequence in base \(\boldsymbol{b}= (b_1, \dots, b_d)\)* if, for any \(r,k_1, \dots, k_d \in \mathbb{N}_0\), every elementary \((k_1, \dots, k_d)\)-interval in base \(\boldsymbol{b}\) contains exactly one point from \(\{\boldsymbol{x}_{rB}, \dots, \boldsymbol{x}_{(r+1)B-1}\}\) where \(B := b_1^{k_1} \cdots b_d^{k_d}\).*

This framework includes well-known sequences as special cases. For example, the Halton sequence is an equidistributed sequence in base \((p_1, \dots, p_d)\), where \(p_j\) denotes the \(j\)th prime number. Furthermore, the notion of a \((0,\boldsymbol{e},d)\)-sequence in base \(b\) is identical to that of an equidistributed sequence in base \((b^{e_1}, \dots, b^{e_d})\). Therefore, Tezuka’s theorem implies that the generalized Niederreiter sequences are equidistributed sequences in base \((b^{e_1}, \dots, b^{e_d})\), where \(e_j = \deg(p_j)\).

2.4 Scrambling↩︎

Scrambling is a procedure that applies a random permutation to the digits of a point’s coordinates, aiming to improve the uniformity of the point set. We begin with a general definition of scrambling.

Definition 8. Let \(x = \xi_1 b^{-1}+\xi_2b^{-2}+\xi_3b^{-3}+\cdots\) with \(\xi_k \in \mathbb{Z}_b\). A nested scramble* in base \(b\) is a random map \(\sigma(x) = y\) that randomizes the digits \(\xi_k\) to obtain \(y = \eta_1 b^{-1}+\eta_2b^{-2}+\eta_3b^{-3}+\cdots\), where the digits \(\eta_i\) are determined via permutations as \[\eta_1 = \pi_{\bullet}(\xi_1), \qquad \eta_k = \pi_{\bullet \xi_1 \xi_2 \dots \xi_{k-1}}(\xi_k) \quad (k \ge 2),\] where all \(\pi_{\bullet}\) and \(\pi_{\bullet\xi_1 \xi_2 \dots \xi_{k-1}}\) are chosen from permutations on \(\mathbb{Z}_b\) and are mutually independent. For a point \(\boldsymbol{x}= (x_1, \dots, x_d) \in [0,1)^d\), a nested scramble in base \((b_1, \dots, b_d)\) is a map \(\sigma = (\sigma_1, \dots, \sigma_d)\) where each \(\sigma_j\) is an independent random scramble in base \(b_j\) applied to \(x_j\).*

For a scramble to be effective in randomized quasi-Monte Carlo, it should satisfy certain uniformity properties. These uniformity properties are crucial for variance analysis.

Definition 9.

  1. A random permutation \(\pi\) on \(\mathbb{Z}_b\) has single-fold uniformity* if, for any \(x \in \mathbb{Z}_b\), \(\pi(x)\) is uniformly distributed on \(\mathbb{Z}_b\).*

  2. A random permutation \(\pi\) on \(\mathbb{Z}_b\) has two-fold uniformity* if, for any \(x\neq y \in \mathbb{Z}_b\), the pair \((\pi(x),\pi(y))\) is uniformly distributed over the \(b(b-1)\)-element set \(\{(c,d) \in \mathbb{Z}_b\times \mathbb{Z}_b\mid c \neq d \}\).*

  3. A nested scramble has one-fold (resp.two-fold) uniformity* if all the permutations that constitute the scramble have one-fold (resp. two-fold) uniformity.*

Note that the two-fold uniformity implies the one-fold uniformity. Several methods exist to implement a scramble with these properties. Owen [6] introduced fully nested scrambling, where the permutations are chosen independently and uniformly from the \(b!\) possible permutations on \(\mathbb{Z}_b\). This method is a conceptual ideal but can be costly both in storing the permutations and applying them. As a fast and practical alternative, the affine matrix scramble has been proposed [24][26]. As in Section 2.1, we fix a bijection \(\phi \colon \mathbb{Z}_b\to \mathbb{F}_b\) with \(\phi(0)=0\). Further, for a real number \(x \in [0,1)\) with the base-\(b\) expansion \(x = \xi_1 b^{-1} + \xi_2 b^{-2} + \cdots\), where \(\xi_k \in \{0, 1, \dots, b-1\}\), we denote the corresponding vector of its digits as \[\label{eq:real2vec} \vec{x} := (\phi(\xi_{1}),\phi(\xi_{2}), \dots)^\top.\tag{6}\]

Definition 10 (Affine matrix scramble). For \(1 \leq j \leq d\), let \(\Delta_j \in \mathbb{F}_b^{\infty}\) and \(M_j \in \mathbb{F}_b^{\infty \times \infty}\). The families \(\{\Delta_j\}\) and \(\{M_j\}\) are independent, with \(\Delta_j\) having i.i.d. uniform entries in \(\mathbb{F}_b\). Each \(M_j\) is a random lower triangular matrix with diagonal entries drawn uniformly from \(\mathbb{F}_b \setminus \{0\}\) and sub-diagonal entries drawn uniformly from \(\mathbb{F}_b\). Let \(\boldsymbol{x}= (x_1, \dots, x_d) \in [0,1)^d\) with its base-\(b\). The scrambled version of \(\boldsymbol{x}\), denoted \(\boldsymbol{x}'\), is given by \[\boldsymbol{x}' := \bigl(\psi(M_1 \vec{x}_1 + \Delta_1), \dots, \psi(M_d \vec{x}_d + \Delta_d)\bigr),\] where \(\psi\) is defined in ?? .

Both fully nested scrambling and affine matrix scrambling possess two-fold uniformity. A key consequence of such scrambles is that a single scrambled point \(\sigma(\boldsymbol{x})\) is uniformly distributed on \([0,1)^d\). Furthermore, scrambling preserves the equidistribution property of the entire sequence. The proof proceeds similarly to the case of \((t,d)\)-sequences [6] and is therefore omitted.

Proposition 3. Let \(\mathcal{S}= (\boldsymbol{x}_0, \boldsymbol{x}_1, \cdots)\) be an equidistributed sequence in base \(\boldsymbol{b}= (b_1, \dots, b_d)\), and let \(\sigma\) be a nested scramble in base \(\boldsymbol{b}\). Then the scrambled sequence \(\mathcal{S}' := (\sigma(\boldsymbol{x}_0), \sigma(\boldsymbol{x}_1), \cdots)\) is an equidistributed sequence in base \(\boldsymbol{b}\) with probability \(1\).

For sequences like the Sobol’ and Niederreiter sequences, which are \((0,\boldsymbol{e},d)\)-sequences in base \(b\), their strong equidistribution property is defined with respect to the mixed base \((b^{e_1},\dots,b^{e_d})\). To preserve this specific structure, it is natural to apply scrambling in this mixed base. We refer to this as a coarse scramble, a central concept of this paper, in contrast to the usual fine or digitwise scramble in base \((b,b,\dots,b)\). This can be achieved by full nested scramble or affine matrix scrambling over a field \(\mathbb{F}_{b^{e_j}}\) with mixed bases. Here we suggest the block affine matrix scramble, a method that generalizes the affine matrix scramble to operate on blocks of digits and directly corresponds to performing operations in the mixed base.

Definition 11 (Block affine matrix scramble). For a prime power \(b \ge 2\), the block affine matrix scramble* in base \((b^{e_1}, \dots, b^{e_d})\) is defined as follows. For \(1 \le j \le d\), let \(\Delta_j \in \mathbb{F}_b^{\infty}\) and \(M_j \in \mathbb{F}_b^{\infty \times \infty}\) be independent random vectors and matrices. Each \(M_j\) is a block lower-triangular matrix with block size \(e_j\), where diagonal blocks are drawn uniformly from the nonsingular matrices in \(\mathbb{F}_b^{e_j \times e_j}\), and off-diagonal blocks are drawn uniformly from all matrices in \(\mathbb{F}_b^{e_j \times e_j}\). The scrambled point \(\boldsymbol{x}'\) is then computed as \[\boldsymbol{x}' := \bigl(\psi(M_1 \vec{x}_1 + \Delta_1), \dots, \psi(M_d \vec{x}_d + \Delta_d)\bigr).\] Note that this recovers the usual affine matrix scramble when \(e_j=1\) for all \(j\).*

This generalized scramble retains the essential properties of the original. The following result is an analogue of the standard case [24], [26], and its proof is omitted.

Lemma 1. For a prime power \(b\), the block affine matrix scramble in base \(\boldsymbol{b}= (b^{e_1}, \dots, b^{e_d})\) is a nested scramble in base \(\boldsymbol{b}\) and has two-fold uniformity.

2.5 Nested ANOVA decomposition and gain coefficients↩︎

To analyze the variance of an RQMC estimator, we decompose it into contributions from the integrand function and the point set. The function’s structure is captured by the nested ANOVA (Analysis of Variance) decomposition, which we introduce first.

The nested ANOVA decomposition breaks down a function \(f: [0,1)^d \to \mathbb{R}\) into a sum of orthogonal components. This decomposition is typically defined using an orthonormal basis, such as the Haar basis [7] or Walsh basis [5]; the equivalence of our formulation to such definitions is established in Appendix 6. The decomposition is constructed with respect to the mixed base \(\boldsymbol{b}\); while this dependency is often omitted from the following notation for brevity, all resulting ANOVA components implicitly depend on \(\boldsymbol{b}\). The decomposition is given by \[\label{eq:ANOVA} f(\boldsymbol{x}) = f_{\emptyset} + \sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} \beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u),\tag{7}\] where \(f_{\emptyset} = \int_{[0,1)^d} f(\boldsymbol{x}) \,d\boldsymbol{x}\), and the component \(\beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u)\) is defined via auxiliary functions \(\tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)\) as \[\begin{align} \beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u) &= \sum_{v \subseteq u} (-1)^{|u|-|v|} \tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u), \tag{8} \\ \tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u) &= m_{u,v,\boldsymbol{k}} \int_{\operatorname{BOX}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)} d\boldsymbol{y}_u \int_{[0,1)^{d-|u|}} f(\boldsymbol{y}_u,\boldsymbol{z}_{-u}) d\boldsymbol{z}_{-u},\tag{9} \end{align}\] where \(\operatorname{BOX}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)\) is the elementary interval containing \(\boldsymbol{x}_u\) defined by \[\label{eq:BOX} \operatorname{BOX}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u) = \prod_{j \in v} \left[\frac{\lfloor x_j b_j^{k_j+1} \rfloor}{b_j^{k_j+1}}, \frac{\lfloor x_j b_j^{k_j+1} \rfloor+1}{b_j^{k_j+1}} \right) \times \prod_{j \in u\setminus v} \left[\frac{\lfloor x_j b_j^{k_j} \rfloor}{b_j^{k_j}}, \frac{\lfloor x_j b_j^{k_j} \rfloor+1}{b_j^{k_j}} \right),\tag{10}\] and \((m_{u,v,\boldsymbol{k}})^{-1}\) is the volume of \(\operatorname{BOX}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)\), namely \[m_{u,v,\boldsymbol{k}} := \prod_{j \in v} b_j^{k_j+1} \prod_{j \in u \setminus v} b_j^{k_j}.\] By definition, \(\tilde{\beta}_{u,v,\boldsymbol{k}}\) is constant on each elementary interval \(\operatorname{BOX}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)\), taking the value of the average of \(f\) over it. These components are related by the following identity: \[\label{eq:beta2} \sum_{\substack{\boldsymbol{k}' \in \mathbb{N}_0^{|u|} \\ k'_j \le k_j \, (\forall j \in u)}}\beta_{u,\boldsymbol{k}'}(\boldsymbol{x}_u) = \tilde{\beta}_{u,u,\boldsymbol{k}}(\boldsymbol{x}_u).\tag{11}\] This is also shown in Appendix 6. The components \(\beta_{u,\boldsymbol{k}}\) are mutually orthogonal. By setting \[\sigma_{u,\boldsymbol{k}}^2 := \operatorname{Var}\beta_{u,\boldsymbol{k}},\] it follows from orthogonality (Parseval’s identity) that the total variance of the function is \[\operatorname{Var}f = \sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} \sigma_{u,\boldsymbol{k}}^2.\] From this, the variance of the standard Monte-Carlo (MC) estimator with \(n\) independent random points is given by \[\label{eq:MC-variance} \frac{\operatorname{Var}f}{n} = \dfrac{1}{n}\sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} \sigma_{u,\boldsymbol{k}}^2.\tag{12}\]

Next, we introduce a tool to characterize the structure of the point set. The gain coefficients quantify how a specific \(n\)-point set \(P_n\) is expected to perform for each ANOVA component when used in an RQMC estimate. The definition for base \(b\) was given in [7], and its extension to mixed bases was introduced in [19].

Definition 12. Let \(P_n=\{\boldsymbol{x}_0,\ldots,\boldsymbol{x}_{n-1}\}\subset [0,1)^d\) be an \(n\)-element point set. For \(\emptyset \neq u \subseteq 1{:}d\) and \(\boldsymbol{k}=(k_j)_{j\in u}\in \mathbb{N}_0^{|u|}\), we define the gain coefficient of \(P_n\) in base \(\boldsymbol{b}=(b_1,\dots,b_d)\) by \[\begin{align} G_{u,\boldsymbol{k}}(P_n) &:= \frac{1}{n} \prod_{j \in u} (b_j-1)^{-1} \cdot \widetilde{G}_{u,\boldsymbol{k}}(P_n), \label{eq:G95Gtilde} \end{align}\qquad{(4)}\] where \[\begin{align} \widetilde{G}_{u,\boldsymbol{k}}(P_n) := \frac{1}{n} \sum_{i=0}^{n-1}\sum_{i'=0}^{n-1} \prod_{j \in u} \left(b_j \chi(\lfloor b_j^{k_j+1}x_{i,j}\rfloor=\lfloor b_j^{k_j+1}x_{i',j}\rfloor) - \chi(\lfloor b_j^{k_j}x_{i,j}\rfloor=\lfloor b_j^{k_j}x_{i',j}\rfloor) \right) \label{eq:Gtilda95def}. \end{align}\qquad{(5)}\]

An explicit formula for the gain coefficients is known [19], \[\label{eq:Gtilda95formula} \widetilde{G}_{u,\boldsymbol{k}}(P_n) = \sum_{v \subseteq u} H_{u,v} C_{u,v,\boldsymbol{k}}(P_n),\tag{13}\] where \[\begin{align} H_{u,v} &= (-1)^{|u|-|v|} m_{u,v,\boldsymbol{0}} = \prod_{j \in v} b_j \prod_{j \in u \setminus v} (-1), \tag{14}\\ C_{u,v,\boldsymbol{k}}(P_n) &= \frac{1}{n^2} \sum_{i=0}^{n-1}\sum_{i'=0}^{n-1} \chi(\text{\boldsymbol{x}_i and \boldsymbol{x}_{i'} are in the same elementary (\boldsymbol{k}+ \boldsymbol{1}_v)-interval}). \tag{15} \end{align}\]

With these two tools—the ANOVA decomposition for the function and the gain coefficients for the point set—we can now state the variance of the RQMC estimator [7], [19].

Theorem 4. Let \(P_n \subset [0,1)^d\) be an \(n\)-point set, and let \(\tilde{P}_n\) be its randomly scrambled version using a nested scramble with two-fold uniformity. Then the variance of the RQMC estimator \(I(f; \tilde{P}_n) = n^{-1}\sum_{\boldsymbol{x}\in \tilde{P}_n} f(\boldsymbol{x})\) is given by \[\label{eq:variance-gain} \operatorname{Var}[I(f; \tilde{P}_n)] = \sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} \dfrac{G_{u,\boldsymbol{k}}(P_n)}{n}\sigma_{u,\boldsymbol{k}}^2.\qquad{(6)}\]

By comparing the RQMC variance ?? with the MC variance 12 , it becomes clear that the gain coefficients \(G_{u,\boldsymbol{k}}(P_n)\) measure how much the underlying point set \(P_n\) reduces the variance compared to ordinary MC sampling for each frequency component \((u, \boldsymbol{k})\) of the integrand. A smaller gain coefficient indicates a greater variance reduction for that component.

3 Gain coefficients of scrambled points↩︎

3.1 General results↩︎

The theoretical properties of gain coefficients for scrambled equidistributed sequences can be established by generalizing the recent analysis of Owen and Pan for Halton sequences [19]. The key insight enabling this generalization is that their foundational formula for the counting term \(C_{u,v,\boldsymbol{k}}(P_n)\) (re-derived here as Lemma 3.1) does not depend on the specific structure of the Halton sequence, but only on the defining property of an equidistributed sequence. This allows us to extend their main results to our broader framework.

Lemma 2. Let \(\mathcal{S}= \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots\} \subset [0,1)^d\) be an equidistributed sequence in base \(\boldsymbol{b}= (b_1, \dots, b_d)\). Let \(n \in \mathbb{N}_0\) and \(P_n = \{\boldsymbol{x}_{0}, \dots, \boldsymbol{x}_{n-1}\}\). Then, for any \(\emptyset \neq v \subseteq u \subseteq 1{:}d\) and \(\boldsymbol{k}=(k_j)_{j\in u}\in \mathbb{N}_0^{|u|}\), we have the following.

  1. We have \[\label{eq:Cformula} C_{u,v,\boldsymbol{k}}(P_n) = n + (2n-m_{u,v,\boldsymbol{k}})\left\lfloor \dfrac{n}{m_{u,v,\boldsymbol{k}}} \right\rfloor - m_{u,v,\boldsymbol{k}} \left\lfloor \dfrac{n}{m_{u,v,\boldsymbol{k}}} \right\rfloor^2.\qquad{(7)}\]

  2. The values \(G_{u,\boldsymbol{k}}(P_n)\), \(\widetilde{G}_{u,\boldsymbol{k}}(P_n)\) and \(C_{u,v,\boldsymbol{k}}(P_n)\) are determined solely by \(n\). By a slight abuse of notation, we write \[\label{eq:abuse} G_{u,\boldsymbol{k}}(n) := G_{u,\boldsymbol{k}}(P_n), \qquad \widetilde{G}_{u,\boldsymbol{k}}(n) := \widetilde{G}_{u,\boldsymbol{k}}(P_n), \qquad C_{u,v,\boldsymbol{k}}(n) := C_{u,v,\boldsymbol{k}}(P_n)\qquad{(8)}\]

Proof. Since [item:Cformula2] follows from ?? , 13 and ?? , it suffices to show \(\eqref{eq:Cformula}\). Let \(n = qm_{u,v,\boldsymbol{k}} + t\) with \(0 \le t < m_{u,v,\boldsymbol{k}}\) and let \(\boldsymbol{k}' := \boldsymbol{k}+ \boldsymbol{1}_v\). The number of the elementary \(\boldsymbol{k}'\)-intervals in base \(\boldsymbol{b}\) is \(m_{u,v,\boldsymbol{k}}\). From the definition of equidistributed sequence in base \(\boldsymbol{b}\), among such intervals, \(t\) intervals contain \(q+1\) points from \(P\) and the other \(m_{u,v,\boldsymbol{k}} - t\) intervals contain \(q\) points. Thus 15 implies that \[C_{u,v,\boldsymbol{k}}(P_n) = t(q+1)^2 + (m_{u,v,\boldsymbol{k}} - t)q^2.\] After simplification using \(t=n-qm_{u,v,\boldsymbol{k}}\) and \(q = \lfloor n/m_{u,v,\boldsymbol{k}} \rfloor\), we obtain ?? . ◻

This observation enables us to generalize results in [19], as follows.

Theorem 5. Let \(\mathcal{S}= \{\boldsymbol{x}_0, \boldsymbol{x}_1, \dots\} \subset [0,1)^d\) be an equidistributed sequence in base \((b_1, \dots, b_d)\). Let \(\emptyset \neq u \subseteq 1{:}d\) and \(\boldsymbol{k}\in \mathbb{N}_0^{|u|}\). Then the gain coefficients in base \(\boldsymbol{b}\) satisfy the following.

  1. If \(1 \le n \le m_{u,\emptyset,\boldsymbol{k}}\), \[G_{u,\boldsymbol{k}}(n) = 1.\]

  2. If \(n = r m_{u,u,\boldsymbol{k}}\) for \(r \in \mathbb{N}\), then \[G_{u,\boldsymbol{k}}(n) = 0.\]

  3. Let \(n = q m_{u,u,\boldsymbol{k}} + r\) for \(q,r \in \mathbb{N}\) with \(0 < r < m_{u,u,\boldsymbol{k}}\). Then \[G_{u,\boldsymbol{k}}(n) = \dfrac{r}{n}G_{u,\boldsymbol{k}}(r).\]

  4. For \(j \in u\), define \(\boldsymbol{k}'\) by \(k'_j = k_j+1\) and \(k'_\ell = k_\ell\) for \(\ell \in u \setminus \{j\}\). Then \[G_{u,\boldsymbol{k}'}(nb_j) = G_{u,\boldsymbol{k}'}(n).\]

  5. For \(n \in \mathbb{N}\), we have \[G_{u,\boldsymbol{k}}\left(n \prod_{j \in u} b_j\right) = G_{u,\boldsymbol{0}}(n).\]

  6. For any \(\emptyset \neq v \subseteq u\), \[\sup_{n \in \mathbb{N}} G_{v,\boldsymbol{0}}(n) \le \sup_{n \in \mathbb{N}} G_{u,\boldsymbol{0}}(n)\]

  7. Let \(j_m = \mathop{\mathrm{arg\,min}}_{j \in u} b_j\). Then \[\sup_{n \in \mathbb{N}} G_{u,\boldsymbol{0}}(n) = \sup_{n \in \mathbb{N}} G_{u,\boldsymbol{k}}(n) \le \prod_{j \in u \setminus \{j_m\} } \dfrac{b_j}{b_j-1}.\]

Proof. Except for items [item:mimic1] and [item:mimic3], the arguments for the corresponding theorems in [19] carry over verbatim, since the proofs rely only on 13 and ?? . The correspondence between our results and those of [19] is as follows: [item:mimic1] to Proposition 3.1 in [19], [item:mimic2] to Proposition 3.2, [item:mimic3] to Proposition 3.3, [item:mimic4] to Proposition 3.5, [item:mimic5] to Corollary 3.6, [item:mimic6] to Theorem 5.2, [item:mimic7] to Theorem 5.3.

For [item:mimic1], our result extends the corresponding proposition in [19] to include the case \(n = m_{u,\emptyset,\boldsymbol{k}}\). Since \(C_{u,v,k}(n) = n\) holds for this value of \(n\), the original proof applies directly.

For [item:mimic3], since the corresponding proposition explicitly uses a property of the Halton sequence, a separate proof is required in our more general setting. From ?? , since \(m_{u,u,\boldsymbol{k}}\) is a multiple of \(m_{u,v,\boldsymbol{k}}\), for \(n = qm_{u,u,\boldsymbol{k}}+r\) we have \[\begin{align} C_{u,v,\boldsymbol{k}}(n) &= n + (2n-m_{u,v,\boldsymbol{k}})\left(\dfrac{qm_{u,u,\boldsymbol{k}}}{m_{u,v,\boldsymbol{k}}} + \left\lfloor \dfrac{r}{m_{u,v,\boldsymbol{k}}} \right\rfloor \right) - m_{u,v,\boldsymbol{k}} \left(\dfrac{qm_{u,u,\boldsymbol{k}}}{m_{u,v,\boldsymbol{k}}} + \left\lfloor \dfrac{r}{m_{u,v,\boldsymbol{k}}} \right\rfloor \right)^2\\ &= \dfrac{q^2 m_{u,u,\boldsymbol{k}}^2 + 2rq m_{u,u,\boldsymbol{k}}}{m_{u,v,\boldsymbol{k}}} + C_{u,v,\boldsymbol{k}}(r), \end{align}\] hence \[\begin{align} \widetilde{G}_{u,\boldsymbol{k}}(n) = \sum_{v \subseteq u} H_{u,v} C_{u,v,\boldsymbol{k}}(n) &= \sum_{v \subseteq u} H_{u,v} \dfrac{q^2 m_{u,u,\boldsymbol{k}}^2 + 2r m_{u,u,\boldsymbol{k}}}{m_{u,v,\boldsymbol{k}}} + \sum_{v \subseteq u} H_{u,v}C_{u,v,\boldsymbol{k}}(r)\\ &= (q^2 m_{u,u,\boldsymbol{k}}^2 + 2r m_{u,u,\boldsymbol{k}}) \sum_{v \subseteq u} (-1)^{|u|-|v|} + \widetilde{G}_{u,\boldsymbol{k}}(r)\\ &= (q^2 m_{u,u,\boldsymbol{k}}^2 + 2r m_{u,u,\boldsymbol{k}}) \prod_{j \in u}(1-1) + \widetilde{G}_{u,\boldsymbol{k}}(r)\\ &= \widetilde{G}_{u,\boldsymbol{k}}(r). \end{align}\] Thus the item [item:mimic3] follows by the normalization of ?? . ◻

As a corollary, the variance of the RQMC error using a scrambled equidistributed sequence decays as \(o(1/n)\), which is strictly better than plain Monte Carlo. The proof is identical to that of [19], with \(\boldsymbol{x}_i\) replaced by \(\tilde{\boldsymbol{x}}_i\).

Corollary 1. Let \(\mathcal{S}\subset [0,1)^d\) be an equidistributed sequence in base \(\boldsymbol{b}= (b_1, \dots, b_d)\), and let \(\{\tilde{\boldsymbol{x}}_0, \tilde{\boldsymbol{x}}_1, \tilde{\boldsymbol{x}}_2, \dots\}\) be a point set obtained by a random scrambling of \(\mathcal{S}\) in base \(\boldsymbol{b}\) that ensures two-fold uniformity. Then we have \[\lim_{n \to \infty} n \cdot \operatorname{Var}\left(\dfrac{1}{n}\sum_{i=0}^{n-1}f(\tilde{\boldsymbol{x}}_i)\right) = 0.\]

3.2 Gain coefficients of scrambled \((0,\boldsymbol{e},d)\)-sequences↩︎

We now specialize these general results to scrambled \((0,\boldsymbol{e},d)\)-sequences, in particular Sobol’ and Niederreiter sequences. Since \((0,\boldsymbol{e},d)\)-sequences in base \(b\) are equidistributed in base \((b^{e_1}, \dots, b^{e_d})\), we will state the results for the latter.

First we consider non-asymptotic results for \(n=b^m\).

Corollary 2. Let \(\mathcal{S}\) be an equidistributed sequence in base \(\boldsymbol{b}=(b^{e_1}, \dots, b^{e_d})\). Let \(\emptyset \neq u \subseteq 1{:}d\), \(\boldsymbol{k}\in \mathbb{N}_0^{|u|}\) and \(j_m = \mathop{\mathrm{arg\,min}}_{j \in u} e_j\). Then, for \(n \in \mathbb{N}_0\) we have \[G_{u,\boldsymbol{k}}(n) \le \Gamma_u,\] where \[\label{eq:Gamma95u} \Gamma_u := \prod_{j \in u \setminus \{j_m\}} \frac{b^{e_j}}{b^{e_j}-1},\qquad{(9)}\] with the convention that the empty product equals \(1\). For the case \(n=b^m\) with \(m \in \mathbb{N}_0\), we have \[\begin{cases} G_{u,\boldsymbol{k}}(b^m) = 0 & \text{if } \sum_{j\in u} e_j(k_j+1) \le m, \\ G_{u,\boldsymbol{k}}(b^m) \le \Gamma_u & \text{if } \sum_{j\in u} e_j k_j < m < \sum_{j \in u} e_j(k_j+1), \\ G_{u,\boldsymbol{k}}(b^m) = 1 & \text{if } \sum_{j\in u} e_j k_j \ge m. \end{cases}\]

Proof. We use Theorem 5. In our case, we have \(m_{u,\emptyset,\boldsymbol{k}} = \prod_{j \in u} b^{e_jk_j}\) and \(m_{u,u,\boldsymbol{k}} = \prod_{j \in u} b^{e_j(k_j+1)}\). Thus the claim for general \(n\) follows from [item:mimic7], and the three claims for \(n=b^m\) follow from [item:mimic2], [item:mimic7], and [item:mimic1] of Theorem 5, respectively. ◻

Remark 6. By letting \(\boldsymbol{e}= (1,\dots,1)\), we obtain the gain coefficients of \((0,d)\)-sequence in base \(b\) as \[\sup_{n \in \mathbb{N}} G_{u,\boldsymbol{k}}(n) \leq \left(\dfrac{b}{b-1}\right)^{|u|-1}.\] For \(n=b^m\), this bound for \((0,m,d)\)-nets is given in [7].

Indeed, the exact maximum gain coincides with the upper bound given in Corollary 2.

Corollary 3. Let \(\mathcal{S}\) be a \((0,\boldsymbol{e},d)\)-sequence in base \(b\). Let \(\Gamma_u\) be constants defined in ?? . Then, for any \(\emptyset \neq u \subseteq 1{:}d\) and \(\boldsymbol{k}\in \mathbb{N}_0^{|u|}\) we have \[\max_{n \in \mathbb{N}} G_{u,\boldsymbol{k}}(n) = \Gamma_u.\]

Proof. According to Theorem 5 [item:mimic7], it suffices to show the theorem for \(\boldsymbol{k}= \boldsymbol{0}\) and in particular that \(n^* := \prod_{j \in u \setminus \{j_m\}} b^{e_j}\) attains the maximum. Since \(n^*\) is a multiple of \(m_{u,v,\boldsymbol{0}}\) for \(v \subsetneq u\), ?? implies that \[C_{u,v,\boldsymbol{0}}(n^*) = n^* + (2n^*-m_{u,v,\boldsymbol{0}})\dfrac{n^*}{m_{u,v,\boldsymbol{0}}} - m_{u,v,\boldsymbol{0}} \left( \dfrac{n^*}{m_{u,v,\boldsymbol{0}}} \right)^2 = \dfrac{(n^*)^2}{m_{u,v,\boldsymbol{0}}}\] for \(v \subsetneq u\), and for \(v=u\) we have \(C_{u,u,\boldsymbol{0}}(n^*) = n^*\). Thus it follows from 13 that \[\begin{align} \widetilde{G}_{u,\boldsymbol{0}}(n^*) = \sum_{v \subseteq u} H_{u,v} C_{u,v,\boldsymbol{0}}(n^*) &= (n^*)^2\sum_{v \subsetneq u} (-1)^{|u|-|v|} + H_{u,u}n^*\\ &= (n^*)^2 \sum_{v \subseteq u} (-1)^{|u|-|v|} + H_{u,u}n^* - (n^*)^2 = H_{u,u}n^* - (n^*)^2. \end{align}\] In conjunction with the normalization ?? , we obtain the desired result. ◻

From Corollary 2 and Eq. ?? , we have the bound of the variance of the scrambled points estimator.

Corollary 4. Let \(\mathcal{S}\) be a \((0,\boldsymbol{e},d)\)-sequence in base \(b\), and let \(\tilde{\boldsymbol{x}}_0, \tilde{\boldsymbol{x}}_1, \tilde{\boldsymbol{x}}_2, \dots\) be a randomly scrambled version of \(\mathcal{S}\) in base \(\boldsymbol{b}=(b^{e_1}, \dots, b^{e_d})\). Let \(\Gamma_u\) be constants defined in ?? . Then, for \(m \in \mathbb{N}\) we have \[\operatorname{Var}\left(\dfrac{1}{b^m}\sum_{i=0}^{b^m-1}f(\tilde{\boldsymbol{x}}_i) \right) \le \dfrac{1}{b^m} \sum_{\emptyset \neq u \subseteq 1{:}d} \Gamma_u \sum_{\substack{\boldsymbol{k}\in \mathbb{N}_0^{|u|}\\ \sum_{j \in u} e_j(k_j+1) > m}}\sigma_{u,\boldsymbol{k}}^2.\]

Lastly, we show a slow growth of the maximal gain coefficients of the coarsely scrambled Sobol’ or Niederreiter sequences.

Corollary 5. Let \(\mathcal{S}\) be the \(d\)-dimensional Sobol’ sequence (in base \(b=2\)), or the full Niederreiter sequence in prime power base \(b\). Then the gain coefficients for the coarse scramble is \(O(\log d)\), specifically, \[\Gamma_d := \max_{\emptyset \neq u \subseteq 1{:}d} \sup_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} \sup_{n \in \mathbb{N}} G_{u,\boldsymbol{k}}(n) \le e \lceil \log_b d + \log_b \log_b(d+b) + 2 \rceil.\]

Proof. Because Sobol’ sequences use only primitive polynomials, the degrees of their generators are never smaller than those of full Niederreiter sequences. Consequently, \(\Gamma_d\) for Sobol’ sequences does not exceed that of full Niederreiter sequences, so it suffices to establish the result for the latter.

Let \(I_b(n)\) be the number of monic irreducible polynomials in \(\mathbb{F}_b\) of degree \(n\), and \(N_b(n) := I_b(1) + \cdots + I_b(n)\). It is well known (see, e.g., [27]) that, for \(n \ge 2\), \[\label{eq:upperbound95Ib} nI_b(n) \le b^n - 1.\tag{16}\] Using this, \(I_b(1) = b\), Corollary 3, and \(\log(1+x) \le x\), the gain coefficients of the \(N_b(M)\)-dimensional Niederreiter sequences, i.e., those using irreducible polynomials of degree up to \(M\), are bounded by \[\begin{align} \log \Gamma_{N_b(M)} &= \sum_{n=1}^M \left(I_b(n) - \chi(n=1)\right) \log \left(1 + \dfrac{1}{b^n-1} \right) \\ &\le \sum_{n=1}^M \dfrac{b^n-1}{n} \dfrac{1}{b^n-1} = \sum_{n=1}^M \dfrac{1}{n} \le 1 + \int_{1}^M \dfrac{1}{x} \,dx = 1 + \log M. \end{align}\] From [28], for any \(d \in \mathbb{N}\) we have \[\deg(p_d) \le \log_b d + \log_b \log_b(d+b) + 2,\] and thus \(d \le N_b(M^*)\) holds with \(M^* = \lceil \log_b d + \log_b \log_b(d+b) + 2 \rceil\). Hence \[\Gamma_d \le \Gamma_{N_b(M^*)} \le \exp(1 + \log M^*) = e \lceil \log_b d + \log_b \log_b(d+b) + 2 \rceil.\] Thus we are done. ◻

This logarithmic growth of the maximal gain coefficient stands in stark contrast to the exponential growth observed for standard Owen scrambling of Sobol’ sequences, highlighting the primary theoretical advantage of coarse scrambling in high dimensions for the worst case.

4 Comparison of coarse and usual scrambling↩︎

In this section, we compare the coarse scrambling and the usual scrambling in base \(b\). We denote the two bases of interest as the coarse base, \(\boldsymbol{b}^{({\mathrm{C}})} := (b^{e_1}, \dots, b^{e_d}),\) and the usual base, \(\boldsymbol{b}^{({\mathrm{U}})} := (b, \dots, b).\) Hereafter, superscripts or subscripts C and U will refer to quantities related to the coarse and usual bases, respectively. We first give a relation between the nested ANOVA decomposition in coarse and usual bases.

Lemma 3. Let \(b,e_1,\dots,e_d \in \mathbb{N}\) with \(b \ge 2\). Let \(f \colon [0,1)^d \to \mathbb{R}\) be a function, and \(\beta_{u,\boldsymbol{k}}^{({\mathrm{C}})}\) and \(\beta_{u,\boldsymbol{k}}^{({\mathrm{U}})}\) be the components of the nested ANOVA decomposition of \(f\) in base \(\boldsymbol{b}^{({\mathrm{C}})}\) and \(\boldsymbol{b}^{({\mathrm{U}})}\) as given in 7 . Then, for any \(\emptyset \neq u \subseteq 1{:}d\) and \(\boldsymbol{k}\in \mathbb{N}_0^{|u|}\) we have \[\label{eq:beta95basechange} \beta_{u,\boldsymbol{k}}^{({\mathrm{C}})}(\boldsymbol{x}) = \sum_{\substack{\boldsymbol{l}=(l_j) \in \mathbb{N}_0^{|u|} \\ e_jk_j \le l_j < e_j(k_j+1) \, (\forall j \in u)}} \beta_{u,\boldsymbol{l}}^{({\mathrm{U}})}(\boldsymbol{x})\qquad{(10)}\] and thus \[\label{eq:sigma95basechange} (\sigma_{u,\boldsymbol{k}}^{({\mathrm{C}})})^2 = \sum_{\substack{\boldsymbol{l}=(l_j) \in \mathbb{N}_0^{|u|} \\ e_jk_j \le l_j < e_j(k_j+1) \, (\forall j \in u)}} (\sigma_{u,\boldsymbol{l}}^{({\mathrm{U}})})^2.\qquad{(11)}\]

Proof. The definition of \(\operatorname{BOX}\) 10 implies that, for any \(\boldsymbol{x}\in [0,1)^{|u|}\) \[\operatorname{BOX}_{u,u,(e_j(k_j+\chi(j \in v))-1)_j}^{({\mathrm{U}})}(\boldsymbol{x}) = \operatorname{BOX}_{u,v,\boldsymbol{k}}^{({\mathrm{C}})}(\boldsymbol{x})\] and thus \[\tilde{\beta}_{u,u,(e_j(k_j+\chi(j \in v)-1))_j}^{({\mathrm{U}})}(\boldsymbol{x}) = \tilde{\beta}_{u,v,\boldsymbol{k}}^{({\mathrm{C}})}(\boldsymbol{x}).\] Then, by inclusion-exclusion principle, 11 , the above equation, and Eq. 8 , we have \[\begin{align} \sum_{\substack{\boldsymbol{l}=(l_j) \in \mathbb{N}_0^{|u|} \\ e_jk_j \le l_j < e_j(k_j+1) \, (\forall j \in u)}} \beta_{u,\boldsymbol{l}}^{({\mathrm{U}})}(\boldsymbol{x}) &= \sum_{v \subseteq u} (-1)^{|u|-|v|} \sum_{\substack{\boldsymbol{l}=(l_j) \in \mathbb{N}_0^{|u|} \\ l_j < e_j(k_j+\chi(j \in v)) \, (\forall j \in u)}} \beta_{u,\boldsymbol{l}}^{({\mathrm{U}})}(\boldsymbol{x})\\ &= \sum_{v \subseteq u} (-1)^{|u|-|v|} \tilde{\beta}_{u,u,(e_j(k_j+\chi(j \in v))-1)_j}^{({\mathrm{U}})}(\boldsymbol{x})\\ &= \sum_{v \subseteq u} (-1)^{|u|-|v|} \tilde{\beta}_{u,v,\boldsymbol{k}}^{({\mathrm{C}})}(\boldsymbol{x}_u)\\ &= \beta_{u,\boldsymbol{k}}^{({\mathrm{C}})}(\boldsymbol{x}_u), \end{align}\] Thus we have the first statement. The second one follows from the fact that \(\beta\)’s are orthogonal. ◻

Hereafter, let \(\mathcal{S}\) be a \((0,\boldsymbol{e},d)\)-sequence in base \(b\). We denote the variance of the estimator using points scrambled in the coarse and usual bases by \(\sigma^2_C(n)\) and \(\sigma^2_U(n)\), respectively: \[\begin{align} \sigma^2_{\mathrm{C}}(n) &:= \operatorname{Var}\left(\frac{1}{n}\sum_{i=0}^{n-1}f(\tilde{\boldsymbol{x}}_i^{(C)})\right), \tag{17} \\ \sigma^2_{\mathrm{U}}(n) &:= \operatorname{Var}\left(\frac{1}{n}\sum_{i=0}^{n-1}f(\tilde{\boldsymbol{x}}_i^{(U)})\right). \tag{18} \end{align}\] Lemma 3 implies that \[\label{eq:sigma-seq-basechange} \sigma^2_{\mathrm{C}}(n) = \sum_{u \subseteq 1{:}d} \sum_{\boldsymbol{k}\in \mathbb{N}_0^{|u|}} G_{u,\boldsymbol{k}}^{({\mathrm{C}})}(n) (\sigma_{u,\boldsymbol{k}}^{({\mathrm{C}})})^2 = \sum_{u \subseteq 1{:}d} \sum_{\boldsymbol{l}\in \mathbb{N}_0^{|u|}} G_{u,(\lfloor l_j/e_j \rfloor)_{j \in u}}^{({\mathrm{C}})}(n) (\sigma_{u,\boldsymbol{l}}^{({\mathrm{U}})})^2.\tag{19}\] Furthermore, for \(n=b^m\), Lemma 3 combining with Corollary 2 and the inequality \[e_j(\lfloor l_j/e_j \rfloor + 1) \le e_j(l_j/e_j + 1) = l_j + e_j\] implies that \[\begin{align} \sigma^2_{\mathrm{C}}(b^m) &\le \sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\substack{\boldsymbol{k}\in \mathbb{N}_0^{|u|} \\ \sum_{j \in u} e_j(k_j+1) > m}} G_{u,\boldsymbol{k}}^{({\mathrm{C}})}(b^m) (\sigma_{u,\boldsymbol{k}}^{({\mathrm{C}})})^2 \notag\\ &= \sum_{\emptyset \neq u \subseteq 1{:}d} \sum_{\substack{\boldsymbol{l}\in \mathbb{N}_0^{|u|} \\ \sum_{j \in u} e_j(\lfloor l_j/e_j \rfloor + 1) > m}} G_{u,(\lfloor l_j/e_j \rfloor)_{j \in u}}^{({\mathrm{C}})}(b^m) (\sigma_{u,\boldsymbol{l}}^{({\mathrm{U}})})^2 \tag{20}\\ &\le \sum_{\emptyset \neq u \subseteq 1{:}d} \Gamma_u \sum_{\substack{\boldsymbol{k}\in \mathbb{N}_0^{|u|} \\ \sum_{j \in u} l_j > m - \sum_{j \in u} e_j}} (\sigma_{u,\boldsymbol{l}}^{({\mathrm{U}})})^2, \tag{21} \end{align}\] where \(\Gamma_u\) is defined as in ?? . Let us compare 21 with the general bound for the usual scramble of the (digital) \((t,m,d)\)-net (see [17], [18]) as \[\label{eq:general95bound95dense} \sigma^2_{\mathrm{U}}(b^m) \le \sum_{\emptyset \neq u \subseteq 1{:}d} b^{t_u}\left(\dfrac{b}{b-1}\right)^{|u|-1} \sum_{\substack{\boldsymbol{l}\in \mathbb{N}_0^{|u|} \\ \sum_{j \in u} l_j > m - t_u}} (\sigma_{u,\boldsymbol{l}}^{({\mathrm{U}})})^2,\tag{22}\] where \(t_u\) denotes the \(t\)-value of the projection of the digital sequence \(\mathcal{S}\) onto the coordinates in \(u\). In our case, by Theorem 1 we can take \(t_u = \sum_{j \in u}(e_j-1)\). It follows that the bound 21 has more terms, but each with a smaller constant factor. We can formalize this observation in the following “meta” theorem. This can be applied to [9] and we obtain a corresponding corollary.

Theorem 7. Let \(\mathcal{S}\) be a \((0,\boldsymbol{e},d)\)-sequence in base b. The variance bound for coarse scrambling 21 is strictly tighter than the standard variance bound for usual scrambling 22 applied to a hypothetical digital \((t,m,s)\)-net with \(t_u = \sum_{j \in u} e_j\). A direct consequence is that any convergence rate proven for usual scrambling that relies solely on the general bound of the form 22 also holds for coarse scrambling.

Corollary 6. Let \(\mathcal{S}\) be a \((0,\boldsymbol{e},d)\)-sequence in base \(b\). Assume that \(\partial^df/(\partial x_1 \cdots \partial x_d)\) exists and it is Lipschitz continuous. Then, the variance \(\sigma^2_{\mathrm{C}}(n)\) of the estimator based on coarse scrambling with \(n=b^m\), given in 17 , decays as \(O(n^{-3} (\log n)^{d-1})\).

Let us proceed more detailed comparison. Comparing 21 and 22 , The variance \(\sigma^2_{\mathrm{C}}(b^m)\) of the estimator based on coarse scrambling may be larger for subsets \(u\) with small \(t_u\) and small cardinality \(|u|\). In some cases, it can be shown that the precise \(t\)-value satisfies \(t_u = \sum_{j \in u} (e_j-1)\) [29], although this does not hold for all subsets.

The most important case is the one-dimensional projection, i.e., \(u = \{j\}\). For such one-dimensional projections, the Sobol’ sequence (or, more generally, Niederreiter sequences with non-singular upper-triangular generating matrices) forms a \((0,1)\)-sequence in base \(b\), i.e., \(t_{\{j\}} = 0\). Thus the variance for the \(\{j\}\)-component for the usual scramble is given by \[\label{eq:1dim-dense} \sum_{l \ge m} (\sigma_{\{j\},l}^{({\mathrm{U}})})^2.\tag{23}\] On the other hand, for the coarse scramble, Corollary 2 and 20 imply that the variance for the \(\{j\}\)-component is bounded by \[\label{eq:1dim-coarse} \sum_{(\lfloor l/e_j \rfloor + 1)e_j > m} G_{\{j\},(\lfloor l_j/e_j \rfloor)}^{({\mathrm{C}})}(b^m) (\sigma_{\{j\},l}^{({\mathrm{U}})})^2 \le \sum_{l \ge \lfloor m/e_j \rfloor e_j} (\sigma_{\{j\},l}^{({\mathrm{U}})})^2\tag{24}\] Thus, since \(l \ge m\) implies \(l \ge \lfloor m/e_j \rfloor e_j\), the exact variance for the usual scramble in 23 is no greater than the bound for the coarse scramble in 24 . This means that, for one-dimensional projections, the coarse scramble can be significantly worse than the usual one. To avoid this drawback, we may choose \(m\) to be a multiple of \(e_j\), in which case the two bounds coincide. This avoidance is observed in our numerical experiment.

In practical applications, such as those in finance, integrands often have what is broadly termed a low effective dimension [30]. This notion is typically specified in two main ways. The first is having a low superposition dimension, where the variance is dominated by interactions among a small number of variables (small \(|u|\)). The second is having a low truncation dimension, where the variance is concentrated in the first few coordinates. As our analysis shows, the trade-off between coarse and usual scrambling depends on the integrand’s effective dimension. For functions with a low superposition dimension, where low-order interactions (especially 1D projections) dominate, usual scrambling is superior due to its perfect \(t=0\) property on 1D components. For functions with a low truncation dimension in high-dimensional settings, where the influence of higher-order interactions is non-negligible, the superior \(O(\log d)\) scaling of the gain coefficient \(\Gamma_u\) for coarse scrambling can become the decisive advantage, making it a compelling alternative.

5 Numerical Experiments↩︎

To empirically validate our theoretical findings, we conduct a series of numerical integration experiments comparing the performance of coarse scrambling with usual scrambling on Sobol’ sequences.

5.1 Experimental Setup↩︎

For both scrambling strategies, we estimate the integral of test functions using a RQMC estimator. The number of sample points is set to \(n=2^m\), where \(m\) ranges from 1 to 16. To assess the estimator’s stability, we compute the root mean square error (RMSE) over \(R=200\) independent replications for each value of \(n\). For sufficiently smooth integrands, theory predicts a convergence rate of approximately \(O(n^{-1.5})\) for the RMSE. The usual and coarse scrambling methods are implemented using affine matrix scramble and block affine matrix scramble, respectively.

We use two distinct test functions designed to probe different aspects of the scrambling methods:

  1. A 37-dimensional linear function. As its variance is entirely contained in one-dimensional projections (\(|u|=1\)), which has low superposition dimension.

  2. A 100-dimensional weighted smooth function. This function is designed to have a low truncation dimension.

The results, presented as log-log plots of the RMSE against the number of points \(n\), are shown in Figure 1.

a
b

Figure 1: Log-log plot of the number of points vs.the root mean square error for coarse (block) and usual affine scrambling. Reference lines for \(O(n^{-1})\) and \(O(n^{-1.5})\) are included.. a — Linear function (\(d=37\)), b — Weighted smooth function (\(d=100\))

5.2 Results and Discussion↩︎

5.2.1 Linear Function↩︎

Our first test integrand is the 37-dimensional linear function: \[f(\boldsymbol{x}) = \sum_{j=1}^{37} x_j.\] The dimension \(d=37\) was chosen because for the Sobol’ sequence, the corresponding generator polynomial degrees result in block sizes \(e_j\) up to 7 (see Table 1).

As shown in Figure 1 (a), the results align perfectly with the theoretical analysis in Section 4. The usual scrambling method consistently outperforms coarse scrambling, exhibiting a smooth convergence rate close to \(O(n^{-1.5})\). In contrast, the coarse scrambling method shows a stepped convergence. The error convergence follows a baseline rate of approximately \(O(n^{-1})\), superimposed with significant drops when \(m\) is a multiple of 7 (i.e., at \(n=2^7\) and \(n=2^{14}\)). This confirms our theoretical prediction: for one-dimensional projections, the error bound for coarse scrambling only matches that of usual scrambling when the number of points is chosen appropriately relative to the block sizes. Since many dimensions have a block size of \(e_j=7\), choosing \(m\) as a multiple of 7 synchronizes the error reduction across these dimensions, leading to the observed drops.

5.2.2 Weighted Smooth Function↩︎

Next, we consider the 100-dimensional weighted function: \[g(\boldsymbol{x}) = \prod_{j=1}^{100} \left(1+\frac{1}{j^2}(x_j e^{x_j}-1)\right).\] This integrand is dominated by its first few dimensions. For \(d=100\), roughly half of the dimensions in the Sobol’ sequence correspond to a block size of \(e_j=9\).

The results in Figure 1 (b) show that both scrambling methods perform comparably, with convergence rates faster than \(O(n^{-1})\) but slightly slower than the ideal \(O(n^{-1.5})\). In this scenario, the potential drawbacks of coarse scrambling are negligible. The function’s variance is concentrated in the low-index dimensions, which have small block sizes (\(e_j=1, 2, \dots\)). The performance degradation from the larger block sizes associated with high-index dimensions is insignificant because these dimensions contribute very little to the total variance. This suggests that for functions with low truncation dimension, coarse scrambling provides a comparable level of accuracy to usual scrambling.

6 Equivalence of ANOVA Formulations↩︎

This appendix serves two purposes. First, we demonstrate that our formulation of the ANOVA decomposition components is equivalent to the standard formulation via the Walsh basis. Second, we prove the identity in 11 . For simplicity, we consider the case of a common base, i.e., \(\boldsymbol{b}= (b,b,\dots,b)\), as the generalization to the mixed base case is straightforward.

Let \(\mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x})\) be the \(\boldsymbol{h}\)-th Walsh function and \(\widehat{f}(\boldsymbol{h})\) be the corresponding Walsh coefficient of \(f\), as defined in [5]. The ANOVA components in [5], which we denote by \(\beta'_{\boldsymbol{l}}(\boldsymbol{x})\), are defined for \(\boldsymbol{l}\in \mathbb{N}_0^d\) as \[\label{eq:beta-dash} \beta'_{\boldsymbol{l}}(\boldsymbol{x}) = \sum_{\boldsymbol{h}\in L_{\boldsymbol{l}}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}),\tag{25}\] where the index set \(L_{\boldsymbol{l}}\) is given by \[L_{\boldsymbol{l}} = \{ \boldsymbol{h}= (h_1,\dots,h_d) \in \mathbb{N}_0^d \mid b^{l_j-1} \le h_j < b^{l_j} \text{ if } l_j>0, \text{ and } h_j=0 \text{ if } l_j=0 \}.\] Our goal is to show that this definition coincides with ours. Specifically, for an index \(\boldsymbol{l}\in \mathbb{N}_0^d\), let \(u := \{j \in 1{:}d \mid l_j > 0 \}\) and let \(\boldsymbol{k}= (l_j-1)_{j \in u}\). We will show that \(\beta'_{\boldsymbol{l}}(\boldsymbol{x}) = \beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u)\), where \(\beta_{u,\boldsymbol{k}}\) is defined as in 11 .

First, note that for any \(j \notin u\), we have \(l_j=0\) and thus \(h_j=0\) for all \(\boldsymbol{h}\in L_{\boldsymbol{l}}\). Since \(\mathrm{wal}_0(x_j) = 1\) and \(\mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}) = \prod_{j=1}^d \mathrm{wal}_{h_j}(x_j)\), the term \(\mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x})\) does not depend on \(x_j\) for \(j \notin u\). Consequently, \(\beta'_{\boldsymbol{l}}(\boldsymbol{x})\) is a function of only \(\boldsymbol{x}_u\).

The set \(L_{\boldsymbol{l}}\) can be expressed using the inclusion-exclusion principle. Let \[\tilde{L}_{\boldsymbol{l},v} := \{ \boldsymbol{h}\in \mathbb{N}_0^d \mid h_j < b^{l_j-\chi(j \in u \setminus v)} \text{ for } 1 \le j \le d\}\] for any \(v \subseteq u\). The indicator function for \(\boldsymbol{h}\in L_{\boldsymbol{l}}\) can then be written as \[\chi(\boldsymbol{h}\in L_{\boldsymbol{l}}) = \sum_{v \subseteq u} (-1)^{|u|-|v|} \chi(\boldsymbol{h}\in \tilde{L}_{\boldsymbol{l},v}).\] Substituting this into 25 yields \[\label{eq:betadash-IE} \beta'_{\boldsymbol{l}}(\boldsymbol{x}) = \sum_{v \subseteq u}(-1)^{|u|-|v|} \left( \sum_{\boldsymbol{h}\in \tilde{L}_{\boldsymbol{l},v}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}) \right).\tag{26}\] The inner sum over \(\boldsymbol{h}\in \tilde{L}_{v}\) is a partial Walsh series of \(f\). As shown in the proof of [5], this sum has an integral representation: \[\begin{align} \sum_{\boldsymbol{h}\in \tilde{L}_{\boldsymbol{l},v}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}) &= \int_{[0,1)^d} f(\boldsymbol{y}) \prod_{j=1}^d b^{l_j-\chi(j \in u \setminus v)}\chi\big(x_j \ominus y_j \in [0,b^{-(l_j-\chi(j \in u \setminus v))})\big) \,d\boldsymbol{y}\notag \\ &= \tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u), \label{eq:beta-betadash} \end{align}\tag{27}\] where \(\ominus\) is the \(b\)-adic bitwise subtraction. The integral expression in the first line corresponds precisely to the definition of \(\tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u)\) in our framework (see 9 ).

By substituting 27 into 26 , we obtain \[\beta'_{\boldsymbol{l}}(\boldsymbol{x}) = \sum_{v \subseteq u}(-1)^{|u|-|v|} \tilde{\beta}_{u,v,\boldsymbol{k}}(\boldsymbol{x}_u).\] The right-hand side is, by definition in 8 , equal to \(\beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u)\). This establishes the equivalence \(\beta'_{\boldsymbol{l}}(\boldsymbol{x}) = \beta_{u,\boldsymbol{k}}(\boldsymbol{x}_u)\) and completes the proof.

Finally, we prove the identity in 11 . For any \(u \subseteq 1{:}d\) and \(\boldsymbol{k}\in \mathbb{N}_0^{|u|}\), we consider the sum of our ANOVA components \(\beta_{u,\boldsymbol{k}'}\) over all \(\boldsymbol{k}' \le \boldsymbol{k}\) (component-wise). Using the established equivalence \(\beta_{u,\boldsymbol{k}'}(\boldsymbol{x}_u) = \beta'_{\boldsymbol{l}(u,\boldsymbol{k}')}(\boldsymbol{x})\), where \(\boldsymbol{l}(u,\boldsymbol{k}')\) is defined such that \(l_j = k'_j+1\) for \(j \in u\) and \(l_j=0\) otherwise, we have \[\begin{align} \sum_{\substack{\boldsymbol{k}' \in \mathbb{N}_0^{|u|} \\ \boldsymbol{k}' \le \boldsymbol{k}}} \beta_{u,\boldsymbol{k}'}(\boldsymbol{x}_u) &= \sum_{\substack{\boldsymbol{k}' \in \mathbb{N}_0^{|u|} \\ \boldsymbol{k}' \le \boldsymbol{k}}} \beta'_{\boldsymbol{l}(u,\boldsymbol{k}')}(\boldsymbol{x}) \\ &= \sum_{\substack{\boldsymbol{k}' \in \mathbb{N}_0^{|u|} \\ \boldsymbol{k}' \le \boldsymbol{k}}} \sum_{\boldsymbol{h}\in L_{\boldsymbol{l}(u,\boldsymbol{k}')}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}). \end{align}\] The union of the disjoint sets \(L_{\boldsymbol{l}(u,\boldsymbol{k}')}\) over all \(\boldsymbol{k}' \le \boldsymbol{k}\) covers exactly the set of all Walsh indices \(\boldsymbol{h}\) where \(h_j < b^{k_j+1}\) for all \(j \in u\) and \(h_j=0\) for \(j \notin u\). This corresponds precisely to the index set we previously denoted as \(\tilde{L}_{\boldsymbol{l}(u,\boldsymbol{k}),u}\). Therefore, we can combine the summations: \[\begin{align} \sum_{\substack{\boldsymbol{k}' \in \mathbb{N}_0^{|u|} \\ \boldsymbol{k}' \le \boldsymbol{k}}} \sum_{\boldsymbol{h}\in L_{\boldsymbol{l}(u,\boldsymbol{k}')}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}) = \sum_{\boldsymbol{h}\in \tilde{L}_{\boldsymbol{l}(u,\boldsymbol{k}),u}} \widehat{f}(\boldsymbol{h}) \mathrm{wal}_{\boldsymbol{h}}(\boldsymbol{x}) = \tilde{\beta}_{u,u,\boldsymbol{k}}(\boldsymbol{x}_u). \end{align}\] The final equality follows directly from 27 . This confirms 11 .

References↩︎

[1]
I. H. Sloan and S. Joe. Lattice Methods for Multiple Integration. Oxford University Press, New York, 1994.
[2]
J. Dick, P. Kritzer, and F. Pillichshammer. Lattice Rules: Numerical Integration, Approximation, and Discrepancy. Springer Cham, Switzerland, 2022.
[3]
J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer. Math., 2:84–90, 1960.
[4]
H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. Society for Industrial and Applied Mathematics, Philadelphia, 1992.
[5]
J. Dick and F. Pillichshammer. Digital Nets and Sequences: Discrepancy Theory and Quasi–Monte Carlo Integration. Cambridge University Press, Cambridge, 2010.
[6]
A. B. Owen. Randomly permuted \((t,m,s)\)-nets and \((t,s)\)-sequences. In H. Niederreiter and P. J.-S. Shiue, editors, Monte Carlo and quasi-Monte Carlo methods in scientific computing, pages 299–317, New York, 1995. Springer.
[7]
A. B. Owen. . SIAM J. Numer. Anal., 34(5):1884–1910, 1997.
[8]
A. B. Owen. Scrambled net variance for integrals of smooth functions. Ann. Stat., 25(4):1541–1562, 1997.
[9]
A. B. Owen. Scrambling Sobol’ and Niederreiter–Xing points. J. Complex., 14(4):466–489, 1998.
[10]
Z. Pan and A. B. Owen. Super-polynomial accuracy of one dimensional randomized nets using the median of means. Math. Comput., 92(340):805–837, 2023.
[11]
Z. Pan and A. B. Owen. Super-polynomial accuracy of multidimensional randomized nets using the median-of-means. Math. Comput., 93(349):2265–2289, 2024.
[12]
T. Goda, K. Suzuki, and M. Matsumoto. A universal median quasi–Monte Carlo integration. SIAM J. Numer. Anal., 62(1):533–566, 2024.
[13]
Z. Pan. Automatic optimal-rate convergence of randomized nets using median-of-means. Math. Comput., to appear, 2025.
[14]
I. M. Sobol’. On the distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz., 7(4):784–802, 1967.
[15]
H. S. Hong and F. J. Hickernell. Algorithm 823: Implementing scrambled digital sequences. ACM Trans. Math. Softw., 29(2):95–109, 2003.
[16]
S. Joe and F. Y. Kuo. Constructing sobol sequences with better two-dimensional projections. SIAM J. Sci. Comput., 30(5):2635–2654, 2008.
[17]
Z. Pan and A. B. Owen. The nonzero gain coefficients of Sobol’ sequences are always powers of two. J. Complex., 75:101700, 2023.
[18]
T. Goda and K. Suzuki. Improved bounds on the gain coefficients for digital nets in prime power base. J. Complex., 76:101722, 2023.
[19]
A. B. Owen and Z. Pan. Gain coefficients for scrambled Halton points. SIAM J. Numer. Anal., 62(3):1021–1038, 2024.
[20]
H. Faure and C. Lemieux. . Acta Arith., 173(1):59–80, 2016.
[21]
H. Niederreiter. Low-discrepancy and low-dispersion sequences. J. Number Theory, 30(1):51–70, 1988.
[22]
S. Tezuka. . ACM Trans. Model. Comput. Simul., 3(2):99–107, 1993.
[23]
S. Tezuka. On the discrepancy of generalized Niederreiter sequences. J. Complex., 29(3-4):240–247, 2013.
[24]
J. Matoušek. On the \({L}_2\)-discrepancy for anchored boxes. J. Complex., 14(4):527–556, 1998.
[25]
S. Tezuka. . Res. Rep. IBM, RT0105:1–10, 1994.
[26]
A. B. Owen. . ACM Trans. Model. Comput. Simul., 13(4):363–378, 2003.
[27]
G. L. Mullen and D. Panario. Handbook of finite fields, volume 17. CRC press Boca Raton, 2013.
[28]
X. Wang. . Math. Comput., 72(242):823–838, 2003.
[29]
J. Dick and H. Niederreiter. On the exact \(t\)-value of Niederreiter and Sobol’ sequences. J. Complex., 24(5-6):572–581, 2008.
[30]
C. Lemieux. Monte Carlo and quasi-Monte Carlo sampling. Springer Series in Statistics. Springer, New York, 2009.

  1. Faculty of Science, Yamagata University, 1-4-12 Kojirakawa-machi, Yamagata, 990-8560, Japan (kosuke-suzuki@sci.kj.yamagata-u.ac.jp)↩︎

  2. The work was supported by JSPS KAKENHI Grant Number 24K06857.↩︎