January 09, 2025
Observations of clustering properties of galaxies on cosmological scales from a wide-area spectroscopic galaxy survey provide a powerful way for constraining cosmological parameters, exploring the physics of inflation, and testing gravity theories on cosmological scales [1]–[3]. There are various existing, ongoing and planned galaxy redshift surveys: the SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS) [4], the SDSS-IV extended Baryon Oscillation Spectroscopic Survey (eBOSS) [5], the Subaru Prime Focus Spectrograph (PFS) [6], the Dark Energy Spectroscopic Instrument (DESI) [7], the ESA Euclid1, and the NASA Nancy Grace Roman Space Telescope [8].
A standard approach to galaxy clustering analysis involves the use of the two-point correlation function or its Fourier-transformed counterpart, the power spectrum. The baryonic acoustic oscillations (BAO) [9] and the apparent anisotropic clustering due to the use of an incorrect cosmological model, i.e. the Alcock-Paczynski (AP) effect [10], in the measured two-point statistics provide a powerful geometrical probe of cosmological distances [11], [12]. In addition, peculiar velocities of galaxies cause characteristic anisotropies in redshift-space distribution of galaxies, i.e. redshift-space distortion (RSD) effect [13], and can be used to probe the growth of structure, free of galaxy bias uncertainty [14], [15]. More recently, advancements in the theory and methods of galaxy clustering analysis have enabled a more comprehensive approach, known as the “full-shape” analysis [16], [17]. This method incorporates BAO, AP, and RSD information as well as the shape information of the underlying matter power spectrum up to the quasi nonlinear scales within an assumed theoretical framework, such as the \(\Lambda\)CDM model.
However, the unknown relationship between the galaxy and dark matter distributions in large-scale structure, known as galaxy bias uncertainty, is a main obstacle to galaxy clustering cosmology. Galaxy bias varies among different types of galaxies, depending on factors such as stellar mass, host halo mass, and star formation history, in a complex manner [18]. Given this fact, it is very difficult or nearly impossible to construct a homogeneous sample of galaxies from galaxy survey data according to the same selection cuts of observables such as their stellar masses across a range of redshifts. For example, for galaxies at higher redshifts, it is easier to observe intrinsically brighter galaxies for the same exposure time. In addition, cuts on colors and apparent magnitudes are often applied to select target galaxies for spectroscopic observations [19], [20], where galaxy colors and magnitudes are redshift-dependent. These result in a non-trivial and inhomogeneous selection of spectroscopic galaxies across redshifts, which we hereafter refer to as the “selection effect”. Although the selection effect is unavoidable in the data, most cosmological analyses adopt theoretical templates of single tracers, effectively ignoring the selection effect, to estimate cosmological parameters by comparing with the measurements.
Therefore, the purpose of this paper is to investigate the impact of the selection effect on the redshift-space power spectrum. To achieve this, in this paper, we consider a case where an assumed sample of galaxies follows a redshift distribution,
denoted as \(n(z)\) or \(n(\chi)\), representing the number density of galaxies per unit redshift interval or per unit radial comoving distance interval. We do not consider the redshift
evolution of galaxy clustering (structure formation) to focus on the impact of the redshift-dependent selection effect. Hence in our setting, the constant comoving number density, \(n(\chi)\mathrm{d}V(\chi)={\rm
const.}\), corresponds to a homogeneous sample or the absence of the selection effect. We first use the linear theory to show that the selection function generally causes a bias in the multipole moments of the redshift-space galaxy power spectrum,
along the Feldman-Kaiser-Peacock estimator [21]. We then use halo catalogs from cosmological N-body simulations, AbacusSumitt
[22], [23], to construct mock catalogs of galaxies (halos), by selecting
halos based on a redshift-dependent mass threshold such that the resulting redshift distribution of the selected halos follows the target \(n(z)\) for the SDSS BOSS LOWZ- or CMASS-like galaxies. We measure the multipole
moments of the redshift-space power spectrum from the mock catalogs and then compare the moments with those obtained from halo catalogs of a single mass threshold, i.e. single tracers, in the same simulations. This comparison quantifies the impact of the
selection effect on the redshift-space power spectrum.
The structure of this paper is as follows: In Section 2 we give the details of the halo catalogs used, and describe our methods for generating galaxy mock catalogs as well as for measuring the redshift-space power spectrum from each mock catalog. We also describe how we evaluate the galaxy selection effect. Section 3 gives the main results of this paper. We evaluate the selection effect for several types of samples, and investigate a possible bias in the cosmological parameter estimation due to the ignorance of the selection effect. In Section 4 we give conclusion and discussion.
In this section, we describe the methodology used in this paper.
We use the halo catalogs constructed from N-body simulation data for the Planck 2018 \(\Lambda\)CDM model, provided by AbacusSummit
[22], [23]. 2 The N-body simulations were performed using \(6912^3\) particles in a comoving cubic box with side length of \(2h^{-1}\mathrm{Gpc}\). The particle mass
was \(2\times 10^{9} h^{-1}\mathrm{M_{\odot}}\) and the force softening scale was \(7.2h^{-1}\mathrm{kpc}\). In this paper we use only the halo catalog and do not use the N-body particle
data. In particular, we use the halo catalogs at a single redshift output, \(z=0.5\), because we focus on the impact of selection effect on clustering measurements and, in other words, we do not include the effect of
redshift evolution in order to highlight our results. Halos in each simulation realization were identified using CompaSO
halo finder [24]. In this paper we use only central halos (labeled “L0” in the released catalogs). We use, as the position of each halo, the center-of-mass of the member particles and, as the velocity of each halo, the center-of-mass
velocity for each halo, as provided in the halo catalogs. We use 25 realizations of the halo catalogs to reduce the statistical errors in our results.
We construct two types of mock galaxy catalogs. One of them includes the selection effect, and the other one is free from the effect. We will measure power spectrum from each mock catalog and take the ratio of the two power spectra to quantify the impact of the selection effect. In the following, we describe how to create these two types of mock catalogs from the halo catalogs.
To mimic actual galaxy surveys, we consider the LOWZ and CMASS galaxy samples of Sloan Digital Sky Survey Baryon Oscillation Spectroscopic Survey (BOSS) Data Release 11 (DR11)3; we consider LOWZ galaxies at \(z>0.09\), and CMASS galaxies at \(z>0.40\), respectively, as shown in Fig. 1. In addition to these cases we also consider a selection function of galaxies that follows a power-law selection of \(n(z)\), given by \(n(z)\propto z^{-\alpha}\), for the generality of our discussion. We investigate three cases, \(\alpha=0.5,1.0,2.0\), and determine their normalizations so that the resulting averaged number density in the radial distance range of \([0,1000]h^{-1}\)Mpc, which roughly corresponds to the range of LOWZ, becomes the same as in the LOWZ case (see the right panel in Fig. 1). Throughout this paper we assume a distant observer approximation, and take the \(z\)-axis of each simulation box to be along the line-of-sight direction.
Figure 1: The number density ofLOWZ (left panel) and CMASS (middle panel) galaxies as a function of redshift, denoted as \(n(z)\), taken from the SDSS DR11 catalog.Note that \(n(z)\) is the number density per unit redshift interval, and we here plot \(n(z)\) as a function of the comovingradial distance usingthe inverse relation of redshift and the comoving radial distance, \(z=z^{-1}(\chi)\), for Planck 2018 cosmology.The LOWZ galaxies are taken from the redshift range \(z\simeq[0.09,0.5]\), while the CMASS galaxies are from the range \(z\simeq[0.4,0.8]\).For the sake of illustration, we set the comoving radial distance at the lowest redshift to \(\chi=0\) for each galaxy sample.We also consider a case of \(n(z)\) following a power-law scaling given as \(n(z)\propto z^{-\alpha}\) (right panel) to study the impact of galaxy selection effect as a general case.
The LOWZ and CMASS galaxies are selected based on the color and magnitude cuts [19], [20], [25], [26]. It is very difficult and nearly impossible to exactly mimic the actual color-magnitude cut for the galaxy selection using the halo catalog, because it requires simulating galaxies while accounting for the complex physics of galaxy formation and evolution. In this paper, we employ the following simplified selection function as a working example. For a given \(n(z)\) for the sample of galaxies considered, we select halos with masses above the mass threshold \(M_{\rm th}(z)\) in each simulation realization so that the resulting number density matches the \(n(z)\) in each redshift bin \([\chi,\chi+\mathrm{d}\chi]\) (since the comoving radial distance is given by \(\chi=\chi(z)\)). We determine the mass threshold \(M_{\rm th}(z)\) in each redshift bin from the average of the mass functions from 25 realizations of halo catalog, to minimize the sample variance effect. Since heavier halos tend to host more massive and therefore brighter galaxies [27], one could consider that this mass-threshold selection at each redshift mimics to some extent a flux limited sample of galaxies. In this paper, we refer to this method as abundance matching (AM) method, and denote the power spectrum measured from this catalog as \(P_{\rm AM}(k)\). Fig. 2 shows the mass thresholds as a function of redshift, \(M_{\rm th}(z)\), which we use to construct mock catalogs for LOWZ, CMASS and power-law samples, respectively.
The mock galaxy catalogs are distributed in a rectangular-shape region of \(2^2\times 1~(h^{-1}{\rm Gpc})^3\) volume. We generate two mocks from the halo catalog in each N-body simulation of \(8~(h^{-1}{\rm Gpc})^3\) volume. We apply zero padding to the halo distribution to perform the Fast Fourier Transform in a \(2^3 ~(h^{-1}{\rm Gpc})^3\) cubic box and then estimate the power spectrum using the Feldman-Kaiser-Peacock (FKP) estimator as described in Section 2.3 in detail. Since the mock catalog violates the periodic boundary condition, the estimated power spectrum is affected by the window convolution. For the following results, we use 50 realizations of the mock catalogs for each galaxy sample.
Halos of different masses have different clustering properties; heavier halos show greater clustering amplitudes and stronger nonlinearities, but have a fewer number density [28]. Thus, the power spectrum measured from the above mock catalogs arises from superposition of power spectra between halos of different masses.
Figure 2: Halomass thresholds as a function of redshift, \(M_{\rm th}(z)\), for the LOWZ(left panel) andCMASS (middle panel) samples, respectively. We select halos with mass above \(M_{\rm th}(z)\) in each redshift bin,so that the resulting number density of halos reproduces the number density of each sample, \(n(z)\),in Fig. 1 (also see Section 2.2.1).We determine the mass threshold function based on the average of 25 realizations of the halo catalog.The three solid lines in the right panel show the mass threshold for each of the three power-law densitycases in Fig. 1.
To quantify the galaxy selection effect, we make a mock catalog which has no galaxy selection effect compared to the AM catalog. We create such a galaxy catalog as follows:
We construct different halo catalogs as a function of varying mass thresholds \(M_{\rm th}\) in each realization.
For each halo catalog of a given mass threshold, we randomly select halos in each redshift bin (or along the \(z\)-axis direction in our setting) so that the resulting redshift distribution of the number density follows \(n(z)\propto \bar{n}_{\rm g}(z)\), where \(\bar{n}_{\rm g}(z)\) is the redshift distribution for the LOWZ, CMASS or power-law galaxy sample, respectively, as shown in Fig. 1. The power spectrum estimated using the FKP estimator (see below) retains the original power spectrum of the single mass threshold, and also incorporates the similar survey window effect as in the \(P_{\rm AM}(k)\) for each of the three galaxy samples.
We find the halo catalog of a certain mass threshold such that its (real space) power spectrum matches the target spectrum \(P_{\rm AM}(k)\) for each of the three galaxy samples on the linear scales in the range \(k=[0.02,0.1]~h{\rm Mpc}^{-1}\). In this step we perform the \(\chi^2\) fitting taking into account the sample variance error in each \(k\) bin estimated from the variance of the 50 realizations.
In the following we call this halo catalog as a single mass-threshold sample that is free from the selection effect. We will compare the two power spectra measured from the AM sample and the single mass-threshold sample to quantify the impact of the galaxy selection effect.
Orange line in each panel of Fig. 3 shows \(n(z)\) of a single mass-threshold sample, constructed based on the above method, for the LOWZ or CMASS galaxy sample. The dashed line in each panel shows the number density of halos corresponding to a certain mass threshold, which by definition has a constant amplitude in the \(z\) direction because we are using the halo catalogs at the single redshift output (\(z=0.5\)). Then, we randomly select halos in each radial distance bin to satisfy \(n(z)\propto n_{\rm LOWZ}(z)\) or \(n_{\rm CMASS}(z)\), respectively. The proportional factor is determined by the average of all the redshift bins. The power spectrum measured from this single mass-threshold sample includes the survey window effect similarly to that for the AM galaxy sample (see Section 2.3). Note that the shot noise contamination estimated by the FKP estimator is different between the AM and single mass-threshold samples, and we subtract it from the measured power spectrum. The single mass-threshold sample in Fig. 3 has the power spectrum with the closest amplitude to the spectrum from the AM sample on the linear scales (see the above step (iii)), for each of the galaxy samples. For LOWZ sample, the single mass threshold is \(M_{\rm th}=9.14\times 10^{12}h^{-1}{\rm M_{\odot}}\), and the proportional factor in \(n(z)\propto n_{\rm g}(z)\) is about \(1/4\). For CMASS sample, the single mass-threshold is \(M_{\rm th}=1.39\times 10^{12}h^{-1}{\rm M_{\odot}}\), and the proportional factor is about \(1/2\).
In the following we denote the power spectrum measured from the single mass-threshold sample as \(P_{M_{\rm th}}(k)\).
Figure 3: A figure illustrating the method for incorporating the window effect on the single-mass threshold sample, used to quantify the impact of selection effects in comparison to the results of the AM catalog.For the single mass-threshold sample,we randomly select halos above the single mass threshold in each redshift binsuch that the redshift dependence of the number densityis proportional to that for the LOWZ, CMASS or power-law sample in Fig. 1 (here we show results only thefor the LOWZ and CMASS samples) (see Section 2.2.2 for details on the single mass-threshold sample).This single mass-threshold hold sample has the same window function as that for the AM sample.
We use the Feldman-Kaiser-Peacock (FKP) estimator [21] to measure power spectrum from each mock catalog. We define the galaxy number density field \(F({\boldsymbol{x}})\) as \[F(\mathbf{x}) = \frac{w(\mathbf{x})}{A_{\rm norm}^{1/2}}[n_{\rm g}(\mathbf{x}) - \alpha n_{\rm s}(\mathbf{x})] \label{Fx},\tag{1}\] where \(n_{\rm g}(\mathbf{x})\) and \(n_{\rm s}(\mathbf{x})\) are the number density fields for galaxies and randoms, respectively. \(\alpha\) is a factor that adjusts the local mean number density of randoms to that of galaxies, defined as \(\alpha = N_{\rm g}/N_{\rm s}\), where \(N_{\rm g}\) is the total number of galaxy particles and \(N_{\rm s}\) is that of random particles. Throughout this paper, we adopt \(\alpha=1/50\). The normalization factor is \(A_{\rm norm} = \int\!\mathrm{d}^3{\boldsymbol{x}}~ \bar{n}^2(\mathbf{x})w^2(\mathbf{x})\), and \(w(\mathbf{x})\) is the so-called FKP weight to improve the signal-to-noise ratio in the power spectrum measurement in the shot noise regime: \[w(\mathbf{x}) = \frac{1}{1+\bar{n}(\mathbf{x})P_0}, \label{FKP95weight}\tag{2}\] where \(\bar{n}(\mathbf{x})\) is given by the redshift distribution \(\bar{n}_{\rm g}(z)\) (Fig. 1) (our mocks do not have any variations in the mean number density in the \(x,y\)-directions), and we adopt \(P_0=10^4~h^{-3}\rm{Mpc}^{3}\) for the AM sample. For the single mass-threshold sample, we set \(P_0=10^4/\beta~h^{-3}\rm{Mpc}^{3}\) in Eq. (2 ), where the \(\beta\) is the proportional factor of \(\bar{n}_{M_{\rm th}}(z)\propto\bar{n}_{\rm g}(z)\) in Section 2.2.2.
We then perform the fast Fourier transform of \(F(\mathbf{x})\) and estimate the power spectrum from \[\hat{P}(\mathbf{k}) \equiv |F(\mathbf{k})|^2 - P_{\rm sn}\label{P95est},\tag{3}\] where \(P_{\rm sn}\) is the shot noise contamination, estimated by \[\begin{align} P_{\rm sn}&=\frac{(1+\alpha)\int \mathrm{d}^3{\boldsymbol{x}}~ \bar{n}(\mathbf{x})w^2(\mathbf{x})}{\int \mathrm{d}^3{\boldsymbol{x}}~ \bar{n}^2(\mathbf{x})w^2(\mathbf{x})}. \label{eq:Psn95def} \end{align}\tag{4}\] We use \(1024^3\) grids in a cubic volume with a side length of \(2~h^{-1}{\rm Gpc}\) that covers the entire region of each mock catalog. Note that we use zero padding in grids where the data does not exist. In this paper, we employ Cloud In Cell (CIC) assignment scheme to generate grid-based data from the particle (galaxy and random) data. We correct for the CIC kernel effect when measuring the power spectrum, which is significant on small scales relevant to the grid scale [29]. The FKP method gives an estimate of the underlying power spectrum including the convolution of the survey window effect: \[\begin{align} \braket{\hat{P}(\mathbf{k})}&=\braket{|F(\mathbf{k})|^2} -P_{\rm sn} \notag \\ &= \int \frac{\mathrm{d}^3{\boldsymbol{k}}^{'}}{(2\pi)^3} P(\mathbf{k^{'}})|G(\mathbf{k}-\mathbf{k^{'}})|^2 \label{expectatoin95value}, \end{align}\tag{5}\] where \(G(\mathbf{k})\equiv[{\int \mathrm{d}^3{\boldsymbol{x}}~ \bar{n}(\mathbf{x})w(\mathbf{x})e^{i\mathbf{k}\cdot\mathbf{x}}}]/[{[\int \mathrm{d}^3{\boldsymbol{x}}~ \bar{n}^2(\mathbf{x})w^2(\mathbf{x})]^{1/2}}]\) is a window function [30], [31]. For \(|{\boldsymbol{k}}|\gg 1/L\), where \(L\) is a size of a survey window, \(\langle \hat{P}(k)\rangle\simeq P(k)\) [32], i.e. the estimator gives an unbiased estimate of the underlying power spectrum. Since the AM mocks and the single mass-threshold mocks have the same redshift dependence of halo distribution, \(n_{\rm AM}(z)\propto n_{M_{\rm th}}(z)\), by design, the power spectra measured from these two mocks have the same effect of window convolution if we keep their \(w(\mathbf{x})\) the same.
The normalization factor \(A_{\rm norm}\) in Eq. (1 ) and the shot noise term \(P_{\rm sn}\) in Eq. (4 ) are given by \[\begin{align} &A_{\rm norm} \to \alpha \sum_{i=1}^{N_s}\bar{n}(z_i)w(z_i)^2 ,\\ P_{\rm sn} &\to \frac{1}{A_{\rm norm}} \Big(\sum_{i=1}^{N_g} + \alpha^2 \sum_{i=1}^{N_s} \Big)w(z_i)^2. \label{P95shot95est} \end{align}\tag{6}\] We also use the interlacing scheme in Ref. [33] to mitigate the aliasing effect. We show the results measured from 50 realizations of the mock catalog for each galaxy sample.
As we discussed, in an actual galaxy survey, it is impossible to have a homogeneous sample of the underlying halos, and rather we have an inhomogeneous sample of halos with respect to halo masses along the radial direction, i.e. affected by the selection effect. This means that the underlying power spectrum becomes position-dependent: \(P({\boldsymbol{k}};{\boldsymbol{x}})\). We can evaluate the expectation value of Eq. (3 ) for the AM mock catalog as \[\begin{align} \braket{\hat{P}_{\rm AM}(\mathbf{k})}=\frac{\int \mathrm{d}^3{\boldsymbol{x}} ~ \bar{n}^2(\mathbf{x})w^2(\mathbf{x})P_{\rm hh}(\mathbf{k};M_{\rm th}(\mathbf{x}))}{\int \mathrm{d}^3{\boldsymbol{x}} ~ \bar{n}^2(\mathbf{x})w^2(\mathbf{x})}=\frac{\int \mathrm{d}x_3 ~ \bar{n}^2(x_3)w^2(x_3)P_{\rm hh}(\mathbf{k};M_{\rm th}(x_3))}{\int \mathrm{d}x_3 ~ \bar{n}^2(x_3)w^2(x_3)}\label{powerspectrum95selection}, \end{align}\tag{7}\] where we ignored the window effect for simplicity, and we set the \(x_3\) direction to be along the redshift (\(z\)) direction. We have dropped the \(x_1x_2\) integrals in the second equality of the above equation, since we employ a homogeneous window in the \(x_1x_2\) plane under the distant observer approximation, and \(M_{\rm th}(x_3)\) is the halo mass threshold in each \(x_3\) bin as shown in Fig. 2. Eq. (7 ) shows that, in the presence of the selection effect, the power spectrum should be given by the integral of power spectra for halos of different masses along the redshift direction.
Before going to the results, it would be useful to consider the linear limit of Eq. (7 ). In the real-space case, we can express the power spectrum at the position \(x_3\) as \(P_{\rm hh}(k; M_{\rm th}(x_3))=b_1^2(M_{\rm th}(x_3))P_{\rm mm}^{\rm L}(k)\) in the linear regime, where \(b_1(M_{\rm th})\) is the linear bias parameter of halos above mass threshold \(M_{\rm th}\) and \(P^{\rm L}_{\rm mm}\) is the linear matter power spectrum. Note that, since we use the single redshift output of simulation, we have ignored the redshift dependence of the growth factor here and in the following. Then the real-space power spectrum measured by the FKP estimator from the AM samples in the linear regime is given as \[\begin{align} \braket{\hat{P}^{\rm R}_{\rm AM}(k)}\simeq \overline{(b_1)^2}P^L_{\rm mm}(k), \label{eq:pkreal95linearlimit} \end{align}\tag{8}\] with \[\begin{align} \overline{(b_1)^2}\equiv \frac{1}{\int\!\mathrm{d}x_3~\bar{n}^2(x_3)w^2(x_3)} \int\!\mathrm{d}x_3~ \bar{n}^2(x_3)w^2(x_3) [b_1\!\left(M_{\rm th}\!(x_3)\right)]^2. \label{eq:ave95bh95squared} \end{align}\tag{9}\] Thus, the overall amplitude of \(\hat{P}^{\rm R}_{\rm AM}(k)\) in the linear regime is proportional to the radial-distance average of \(b_1^2(x_3)\) weighted by \(\bar{n}^2(x_3)w^2(x_3)\).
Similarly, in redshift space, we can express the local redshift-space power spectrum at position \(x_3\) as \(P_{\rm hh}^{\rm S}(\mathbf{k},M_{\rm th}(x_3))=[b_1^2(M_{\rm th}(x_3))+2f\mu^2 b_1(M_{\rm th}(x_3))+f^2\mu^4]P^{\rm L}_{\rm mm}(k)\) in the linear regime, based on the Kaiser formula [13], where \(f\) is the linear growth rate and \(\mu\) is the cosine angle between the line of sight direction (\(x_3)\) and the wavenumber. Therefore the redshift-space power spectrum measured by the FKP estimator for the AM samples is \[\begin{align} \braket{\hat{P}^{\rm S}_{\rm AM}(k,\mu)}\simeq \left[\overline{(b_1)^2}+2f\mu^2\overline{b_1}+f^2\mu^4\right]P^{\rm L}_{\rm mm}(k), \end{align}\] where \[\begin{align} \overline{b_1}\equiv \frac{1}{\int\!\mathrm{d}x_3~n^2(x_3)w^2(x_3)} \int\!\mathrm{d}x_3~ \bar{n}^2(x_3)w^2(x_3) b_1\!\left(M_{\rm th}\!(x_3)\right). \label{eq:ave95bh} \end{align}\tag{10}\] It should be noted that \(\overline{(b_1)^2}\) is generally different from \((\overline{b_1})^2\) for a sample of halos of different masses. This means that, even if we define the single mass-threshold sample that reproduces the amplitude of the real-space power spectrum for the AM sample on linear scales, i.e. \(P_{M_{\rm th}}^{\rm R}(k)\simeq P_{\rm AM}^{\rm R}(k)\), the redshift-space power spectra for the two samples do not necessarily match at small \(k\) in the linear regime. Strictly speaking, therefore, the redshift-space power spectrum in the presence of the selection effect cannot generally be interpreted in terms of the power spectrum of single biased tracers as usually done in analyses. In the following, we quantify the impact of the selection effect.
A similar argument would hold on nonlinear scales. On quasi nonlinear scales, one can use the perturbative bias expansion to model the galaxy density field, e.g. using the effective field theory of large-scale structure (EFTofLSS) [2], [34]–[36]. The linear and higher-order bias parameters and the EFTofLSS counter terms are known to depend on halo mass or properties of galaxies [2], [37], [38]. This means that, when the selection effect given by \(n(z)\) exists, it causes non-trivial redshift-dependent effects in the clustering properties, which therefore cannot be described by single tracers. On very nonlinear, small scales, phenomenological approaches such the as halo occupation distribution (HOD) method [17] and subhalo abundance matching method [39] can be used to model clustering properties of galaxies. In these approaches, galaxies are populated in their host halos according to the given galaxy-halo connection, and therefore the selection effect inevitably leads to redshift-dependent effects in the clustering properties.
From these considerations, we conclude that the selection effect causes redshift-dependent systematic effects in the clustering properties of galaxies that cannot be described by single tracers, on all scales from the linear to nonlinear scales. The question then arises of how large the systematic effects are, which is the main focus of this paper.
In this section, we show the main results of this paper.
Figure 4: A comparison of the two power spectra,\(P_{\rm AM}\) and \(P_{M_{\rm th}}\), for the abundance-matching (AM) sampleand the single mass-threshold (\(M_{\rm th}\)) sample both of which reproduce \(n(z)\) of the LOWZ sample.We show the ratio for the real-space power spectrum (\(\textit{left panel}\)), and the
monopole (\(\textit{middle panel}\)) and quadrupole (\(\textit{right panel}\)) moments of the redshift-space power spectrum. The data points in each panel arethe mean of the\(50\) realizations and the error bars are the \(1\sigma\) error on the mean, estimated by dividing the standard deviation by \(\sqrt{50}\).The black dashed
vertical line in each panel indicates the half of the Nyquist wavenumber, and the solid line isthe ratio computed using the D
ark Emulator in Kobayashi et al. [17] (see text for the details).
Fig. 4 shows the results for the LOWZ galaxy sample. We can measure the ratio \(P_{M_{\rm th}}/P_{\text{\rm AM}}\) from the 50 realizations. We adopt \(M_{\rm th}=9.14\times 10^{12}h^{-1}\rm M_{\odot}\) to define the single mass-threshold sample, which minimizes the \(\chi^2\) difference between the real-space power spectra \(P_{\rm AM}\) and \(P_{M_{\rm th}}\) in the range \(k=[0.02:0.1] ~ h\rm Mpc^{-1}\) as shown in the left panel. The blue or orange points in each panel are the mean of the 50 realizations, and the error bars are the \(1\sigma\) errors on the mean, estimated by dividing the standard deviation by \(\sqrt{50}\). The black dashed line indicates the half of the Nyquist wavenumber.
The left panel of Fig. 4 shows the ratio of the real space power spectra for the AM and the single mass-threshold mock catalogs. The ratio is very close to unity to within 0.5% up to \(k\simeq 0.5~h^{-1}{\rm Mpc}\), but starts to deviate from unity in the larger \(k\) bins. The nice agreement also holds for the monopole and quadrupole moments of the redshift-space power spectrum in the middle and right panels, respectively. The rapid change around \(k\sim 0.6~h{\rm Mpc}^{-1}\) for the quadrupole moment is due to the fact that the moment has a zero crossing around the scale. Thus, we conclude that the selection effect is small for the LOWZ-like sample.
Figure 5: Similar to the previous figure, but the results for the CMASS sample.
Fig. 5 shows the similar results for the CMASS galaxy sample. We adopt \(M_{\rm th}=1.39\times 10^{13}h^{-1}\rm M_{\odot}\) for the single mass-threshold sample. As in Fig. 4, we find a similar degree of the agreement for the real- and redshift-space power spectra.
We now give a theoretical interpretation of the results in Figs. 4 and 5. For this purpose, we use the Dark Emulator
developed in Ref. [40] [28], which is a
simulation-based emulator that allows a fast, accurate computation of the real- or redshift-space power spectrum of halos for input parameters (cosmological parameters, redshift, and halo masses). For the redshift-space power spectrum, the emulator outputs
\(P(k,\mu)\), where \(\mu\) is a cosine angle between the wavenumber vector and the line-of-sight direction. Hence, we can use the emulator output to compute the monopole and quadrupole
moments. The red solid curves in Figs. 4 and 5 show the ratio, \(P_{M_{\rm th}}/P_{\rm AM}\), computed using Dark Emulator
. We can
compute \(P_{\rm AM}\) based on Eq. (7 ), substituting the emulator’s output for the local power spectrum, 4 while we can compute \(P_{M_{\rm th}}\) directly using the emulator prediction for a single mass threshold. We determine this mass threshold so that \(P_{M_{\rm
th}}\) matches \(P_{\rm AM}\) at \(k=0.02~h\rm Mpc^{-1}\). These emulator predictions fairly well reproduce the simulation result, including the \(k\)
dependence. Thus, these results provide independent justification for the result that the selection function is small. Please note that the emulator predictions for the monopole and quadrupole moments are about 1% and 5% fractional accuracies for halos of
\(\sim 10^{13}h^{-1}M_\odot\) up to \(k\simeq 0.60~h{\rm Mpc}^{-1}\) for cosmologies around the fiducial \(\Lambda\)CDM model, as carefully studied in
Ref. [40].
We can also interpret the upturn behavior in high \(k\) bins in Figs. 4 and 5 qualitatively as follows. The average mass of halos in the AM catalog is lighter than the halo mass in the single mass-threshold catalog for each realization. The nonlinear clustering for halos of lower mass is weaker than that for heavier halos. For example, lighter halos have smaller nonlinear bias parameter (\(b_2\)) than heavier halos do [2], [41]. This results in the amplitude of the power spectrum at higher \(k\) for lighter halos being smaller than that for heavier halos, even when the linear-scale amplitudes of the power spectra are matched (see Eq. 9 ). In addition, lighter halos have larger random motions in the large-scale structure than heavier halos do, leading to a larger suppression in the redshift-space power spectrum amplitudes for lighter halos via the Finger-of-God like effect as shown in Fig. 3 of [42]. These effects lead to the upturn behavior at high \(k\) for the ratio of \(P_{M_{\rm th}}/P_{\rm AM}\).
Figure 6: Similar to Fig. 4, but the resultsfor the power-law samples in the right panel of Fig. 1.The panels in the top, middle and bottom row show the results for \(\alpha=0.5, 1\) and \(2\) of \(n(z)\propto z^{-\alpha}\),respectively.
To have the generality of our discussion, we also study the impact of the selection effect for a galaxy sample with \(\bar{n}(z)\propto z^{-\alpha}\) (\(\alpha=0.5, 1\) or \(2\)) as shown in the right panel of Fig. 1. Fig. 6 shows the results. For the cases of \(\alpha=0.5,1\), we find that the ratio is close to unity up to \(k\simeq 0.2~h{\rm Mpc}^{-1}\), but starts to show an upturn behavior in high \(k\) bins as in Figs. 4 and 5. We can also find that the emulator results (red curves) fairly well reproduce the simulation results. However, for the case of \(\alpha=2\), which corresponds to the largest selection effect in our study, the deviation from unity in the ratio can be significant: the monopole moment shows a small deviation, up to \(1\%\) at \(k\lesssim 0.3~h{\rm Mpc}^{-1}\), whereas the quadrupole exhibits a large deviation of up to \(8\%\), due to the reason explained in Section 2.3.1. This implies that if \(n_{\rm g}(z)\) has a large redshift dependence, the selection effect could be large and not negligible.
Figure 7: A possible bias in the \(\sigma_8\) estimation from hypothetical measurements of the monopole andquadrupole moments of the redshift-space power spectrum for the LOWZ- and CMASS-like galaxy samples, ifthe selection effect is ignored.We estimate the bias usingthe Fisher method (Eq. 11 ) and present the bias as a function of the maximum wavenumber\(k_{\rm max}\) included in the analysis.The error bar corresponds to a volume of \(2~(h^{-1}{\rm Gpc})^3\) and includes marginalization over other parameters(see Section 3.3 for the details).We slightly shift the symbols along the \(x\)-axis for illustration.
In this section, we assess the impact of the selection effect on cosmological parameter estimation. Since the selection effect causes a bias in the power spectrum amplitude and does not shift the scale of the baryon acoustic oscillations (BAO) as can be
found from Figs. 4–6, we evaluate the impact of the selection effect on an estimation of \(\sigma_8\), the present-day rms linear mass density
fluctuations within a top-hat sphere of \(8~h^{-1}{\rm Mpc}\) radius, which is one of the most important parameters estimated from galaxy clustering analysis [17], [43]. We use the
Fisher matrix formalism [43], [44] to estimate a bias
in cosmological parameters as \[\begin{align}
\delta p_{\alpha} &= \sum_{\ell \ell^{'}}\sum_{\beta} \Big( F^{\ell \ell^{'}} \Big)^{-1}_{\alpha \beta} \sum_{ij} \Big[ P_{\ell}^{\rm selection}(k_i) - P_{\ell}^{\rm w/o~selection}(k_i) \Big] {\rm Cov}^{-1} \Big[ \hat{P}_{\ell}(k_i),
\hat{P}_{\ell^{'}}(k_j) \Big] \frac{\partial P_{\ell^{'}}(k_j)}{\partial p_{\beta}} \notag \\
&\simeq \sum_{\ell \ell'}\sum_{\beta} \Big( F^{\ell \ell^{'}} \Big)^{-1}_{\alpha \beta} \sum_{ij} \Big( \frac{P_{{\rm AM},\ell}(k_i)}{ P_{M_{\rm th},\ell}(k_i)} -1 \Big) P_{\ell}(k_i) {\rm Cov}^{-1} \Big[ \hat{P}_{\ell}(k_i),
\hat{P}_{\ell^{'}}(k_j) \Big] \frac{\partial P_{\ell^{'}}(k_j)}{\partial p_{\beta}},\label{fisher95bias}
\end{align}\tag{11}\] where \(P^{\rm selection}\) and \(P^{\rm w/o~selection}\) are the power spectra with and without the selection effect, and we have assumed that the
spectra are given by \(P_{\rm AM}\) and \(P_{{M_{\rm th}}}\), respectively; \(P_{M_{\rm th},\ell}\) is the \(\ell\)-th
moment of the power spectrum, \({\rm Cov}^{-1}\) is the inverse of the covariance matrix, and \(F^{\ell\ell'}\) is the Fisher matrix. We assume the Gaussian covariance matrix and use the
method in Refs. [43], [45] to analytically compute
the auto- and cross-covariance matrices for the multipole moments of the power spectrum. We use Dark Emulator
to compute the \(P_\ell\), the covariance matrix, and \(F^{\ell
\ell^{'}}\). We include the information of the monopole and quadrupole moments, i.e. up to \(\ell=2\), because the hexadecapole moment (\(\ell=4\)) is sub-dominant in the
cosmological information content, as shown in [43]. The quantity \(\delta p_{\alpha}\) estimated
from the above equation quantifies a bias in the parameter \(p_{\alpha}\) due to the selection effect, if the model prediction does not include it. We here adopt \(\mathbf{p} = \{
\Omega_m,~\sigma_8,~\alpha_{\parallel},~\alpha_{\perp},~M_{\rm th}, P_{\rm res}\)} as a set of the model parameters, where \(P_{\rm res}\) is the residual shot noise [46] and \(\alpha_{\parallel,~\perp}\) are the parameters to model the Alcock-Paczynski (AP) effect [47], [48]. By treating \(\alpha_{\parallel,~\perp}\) as free parameters in the Fisher analysis, we can marginalize over the geometrical information, mainly from the BAO features, to estimate the selection effect in the \(\sigma_8\) parameter, which is constrained by the amplitude information of the power spectrum. Throughout this analysis, the redshift is \(0.5\), the same as the snapshot of the halo catalog we
use in Section 2.1. The fiducial values of \(M_{\rm th}\) we need for the calculation of Eq. (11 ) are the thresholds we adopted to draw the red solid
lines in Figs. 4 and 5. We also set other fiducial values of the parameters to the ones for the Planck 2018 cosmology, and assume survey volumes to be \(1.98\) and \(2.26~(h^{-1}\rm{Gpc})^3\) in the LOWZ and CMASS cases, respectively.
Fig. 7 shows the results for a possible bias in the estimation of \(\sigma_8\), for the LOWZ- and CMASS-like samples. The biases for other cosmological parameters, such as \(\Omega_m\), \(\alpha_{\parallel}\) and \(\alpha_{\perp}\), are similar to the one for \(\sigma_8\), so we omit them from the figure. This figure shows that the selection effect does not cause an amount of bias in these parameter greater than the marginalized error, even when the maximum wavenumber of the analysis is \(k_{\rm max}=0.3~h\rm Mpc^{-1}\). Therefore, we conclude that the selection effect is unlikely to be significant if the selection effect is like \(n_{\rm g}(z)\) of the LOWZ and CMASS samples.
In this paper, we have studied the impact of the selection effect of galaxies on the power spectrum. To evaluate the galaxy selection effect, we mimicked the selection function by selecting halos above the redshift-dependent mass threshold in the N-body simulation, such that the resulting redshift distribution of the number density reproduces the target \(n(z)\) for LOWZ- and CMASS-like galaxies. We demonstrated analytically that the selection effect inevitably introduces a bias in the redshift-space power spectrum (see Section 2.3.1). We then quantified the impact of the selection effect by comparing the power spectra measured from the galaxy mock catalogs with those from the catalogs of single mass-threshold (and therefore single-tracer) sample. We found that the selection effect causes fractional changes of up to \(1\) and \(2\%\) in the monopole and quadrupole moments of the redshift-space power spectrum, up to \(k=0.3~h{\rm Mpc}^{-1}\), for the LOWZ- and CMASS-like galaxies. Thus, our results imply that, for the given cosmological model, the halo mass dependencies of linear and nonlinear halo bias, nonlinear clustering, and the RSD effect are unlikely to cause significant bias in the redshift-space power spectrum, as long as the selection effect, quantified by \(n(z)\), does not have a significant redshift dependence. We estimated a possible bias in the \(\sigma_8\) estimation due to the neglect of the selection effect on the redshift-space power spectrum using the Fisher method, and showed that this effect does not cause a significant bias in \(\sigma_8\) compared to the statistical error for a volume of \(2~(h^{-1}{\rm Gpc})^3\). This is good news for the current galaxy survey and will provide useful guidance for designing a wide-area spectroscopic galaxy survey. However, this selection effect might need to be taken into account for ultimate galaxy surveys covering volumes larger than \(\sim 50~(h^{-1}{\rm Gpc})^3\) such as the DESI and Euclid surveys.
In this paper we worked on the halo catalogs, not galaxies. However the results of this paper are qualitatively applicable to galaxy clustering, because galaxies form in their halos and therefore galaxy clustering is given by a weighted sum of halo clustering over different halo masses, e.g. using the halo occupation distribution in halo model approach [17], [28], [43], [49], [50]. The exact magnitude of the bias in the galaxy clustering amplitude due to the selection effect requires the use of a realistic mock catalog of galaxies for a sample of galaxies under consideration. We believe that the method developed in this paper would be useful for such a study, and can also be used when designing future galaxy surveys.
We would like to thank Yosuke Kobayashi for useful discussion and for allowing us to use the Dark Emulator
for the redshift-space power spectrum of halos in this paper. We also thank Toshiki Kurita for useful discussion. This work was
supported in part by World Premier International Research Center Initiative (WPI Initiative), MEXT, Japan, and JSPS KAKENHI Grant Numbers 19H00677, 20H05850, 20H05855, 23KJ0747, and 24H00215. K.N. also sincerely acknowledges the financial support from the
research assistant program in AI & Climate Data-Driven ELSI-RRI Study led by Prof. Hiromi Yokoyama.
https://www.esa.int/Science_Exploration/Space_Science/Euclid↩︎
https://abacussummit.readthedocs.io/en/latest/citation.html↩︎
For the actual computation of the local power spectrum in Eq. (7 ), we input the cumulative halo number density \(\bar{n}(x_3)\) (Fig. 1) into the emulator, instead of halo mass threshold \(M_{\rm th}(x_3)\) (Fig. 2).↩︎