Abstract

Observations of density variations in stellar streams are a promising probe of low-mass dark matter substructure in the Milky Way. However, survey systematics such as variations in seeing and sky brightness can also induce artificial fluctuations in the observed densities of known stellar streams. These variations arise because survey conditions affect both object detection and star–galaxy misclassification rates. To mitigate these effects, we use Balrog synthetic source injections in the Dark Energy Survey (DES) Y3 data to calculate detection rate variations and classification rates as functions of survey properties. We show that these rates are nearly separable with respect to survey properties and can be estimated with sufficient statistics from the synthetic catalogs. Applying these corrections reduces the standard deviation of relative detection rates across the DES footprint by a factor of five, and our corrections significantly change the inferred linear density of the Phoenix stream when including faint objects. Additionally, for artificial streams with DES like survey properties we are able to recover density power spectra with reduced bias. We also find that uncorrected power-spectrum results for LSST-like data can be around five times more biased, highlighting the need for such corrections in future ground based surveys.

FERMILAB-PUB-25-0649-PPD
DES-2025-0905

1 Introduction↩︎

The fundamental nature of dark matter is an outstanding question in physics and astronomy. The standard model of dark energy plus cold dark matter (\(\Lambda\)CDM) fits data at large scales and predicts the existence of smaller dark matter sub-halos in our Galaxy [1]–[4]. At these smaller scales the population of Milky Way satellites provides insight into the nature of dark matter down to the threshold of star formation (\(M\sim 10^8M_\odot\)), helping to constrain the microphysics of dark matter models and abundance of low mass subhalos. Currently, these observations are consistent with the standard model [5]–[8]. To continue to stress test \(\Lambda\)CDM and detect the impact of fully dark subhalos, we must look to additional probes. In the far-field, galaxy-scale gravitational lenses are expected to be sensitive to these dark subhalos down to a mass of \(\sim 10^7 M_\odot\); either through flux-ratio anomalies, image positions, or time delays [9], [10]. Additionally, a promising near-field probe of dark subhalos comes from stellar streams around the Milky Way.

Stellar streams are the tidally disrupting remnants of star clusters and satellite galaxies [11]. The discovery and characterization of these systems around the Milky Way has been enabled by wide-field astronomical surveys, initially through matched-filter searches of photometric data (e.g., SDSS; [12], Pan-STARRs; [13], [14], DES; [15]), and subsequently using combinations of photometric, astrometric and spectroscopic data (e.g., Gaia; [16], S5; [17]). Currently, there are more than 120 identified stellar streams around our Galaxy [18], [19].

Within an individual stream, the stellar positions and velocities are probes of the local acceleration field experienced by the stream stars as they orbit around the Milky Way’s gravitational potential [20]. Therefore, in a Milky Way potential absent of substructure, it is expected that streams would show less substructure; although some will be present due to variations in the stripping rate and the formation of epicyclic overdensities [21]. But, the presence of dark subhalos passing by a stream will induce perturbations in the on-sky positions, density, and velocity distribution of stream stars. This can result in the formation of gaps and other small scale deviations in the stream track (e.g., [22], [23], [24], [25], [26]).

The population of kinematically cold stellar streams (i.e., ones with globular cluster progenitors) comprise the best probe of this effect due to their small intrinsic velocity dispersions and stream widths [27]. One such kinematically cold stellar stream is the Phoenix stream. This substructure, originally discovered in early DES data [28], has a length of 4.6 kpc (\(\sim15^{\circ}\)), a width of \(~0.14^{\circ}\), and a measured velocity dispersion of \(\sigma_{rv}=2.66 \,\mathrm{km/s}\) [15], [29]. Interestingly, this stream also shows small scale density fluctuations, making it a promising candidate for probing the low mass regime of the subhalo mass function sec. ¿sec:Phoenix?. Due to these characteristics, we use this stream as an example application in our analysis. For thin streams in general, there has been a large effort to follow up discoveries using deeper imaging surveys to access fainter stars and provide the best constraints on the track and density variations [30]–[35], while proper motion measurements [36] and targeted spectroscopy of brighter members [37], [38] can be used to refine the stellar samples, constrain dynamics and further characterize the interaction history of these substructures.

One common approach to quantify the observable effects of gravitational interactions with dark matter is through power-spectrum-type analyses of density variations (stars/deg) along a stream [24], [26]. Such inferences rely on accurate measurements of the density and track along the stream. At small spatial scales, shot noise from the limited number of stars dominates the stream density power spectrum, thus it is most important to reduce bias on measurements at larger scales [20].

In photometric imaging data, systematic biases can be produced by unaccounted differences between intrinsic and observed source populations, i.e., the observational selection function. The observational selection function depends on intrinsic source properties (e.g., flux in multiple photometric bands, surface brightness) together with external survey properties (see Table 1) that include both astrophysical effects (e.g., projected source density, interstellar extinction) and observational details (e.g., integrated exposure time) that generally vary over the survey footprint. Streams covering tens of degrees on the sky will span multiple telescope fields of view, and are likely to be observed with a variety of survey properties.

In addition, for most optical surveys, the observations are taken over multiple epochs under different observing conditions. This produces variations in detection rates for both stars and galaxies in imaging data. It also causes variations in how often stars and galaxies are incorrectly classified. Already with DES Year 3 data, we find that spatially variable observational selection effects across the survey footprint induce statistically significant effects (Section 4). As our data and analyses become more sophisticated, percent-level variations in the survey-transfer function can induce systematic errors that are much larger than the statistical errors and therefore limit the power of analyses [39].

Traditionally, to avoid these systematic biases, a selection for high signal-to-noise objects is applied to minimize variations in observational selection effects. For example, in the DES DR1 stream search, the stellar selection was limited to \(g < 23.5\) [15]. However, at the magnitudes that will be accessible to future surveys, such as the NSF-DOE Vera C. Rubin Observatory’s Legacy Survey of Space and Time [40], galaxies are much more common, and even small variations in galaxy misclassification rates could produce large effects on the stellar sample [41]. Therefore, new tools are needed to fully leverage the potential LSST-like data.

In the context of large-scale structure analyses for cosmology, galaxy weight maps have been derived to account for spatially variable observational selection effects that would otherwise produce systematic errors in galaxy clustering measurements. These weight maps are derived using the ansatz that galaxies are isotropically distributed on large angular scales, and that empirical relations between observed galaxy densities and survey properties (e.g., seeing, integrated exposure time) can be used to learn the observational selection function for galaxies across the survey footprint [42]. This approach is not possible for stellar samples, because the intrinsic distribution of stars across the survey footprint is highly non-isotropic.

In this work, we explore the use of synthetic source injection (SSI) in combination with survey metadata (i.e., survey property maps) to correct stellar samples for spatially variable observational selection effects. The SSI pipeline inserts realistic artificial stars and galaxies directly into pixel-level image data and re-runs source detection and measurement algorithms to effectively sample the observational selection function at locations across the survey footprint. Specifically, we use the Balrog implementation of SSI for DES Y3 as a testing ground to develop the methodology [39].

The paper is organized as follows: in Section 2 we discuss the data from DES Y3 and Balrog that will be used in our corrections. In Section 3 we discuss the calculations involved in our corrections. In Section 4 we apply our corrections to the full DES Y3 footprint and compare these results to pre-corrected data. In Section 5 we test the overall corrective power of our algorithm, how this power changes with larger training sets, and how repeatable our algorithm is. In Section 6 we apply our corrections to observations of the Phoenix stream and investigate the resulting changes in linear density. We then apply our corrections to simulated stellar streams and compare density power spectra. Finally, we conclude and motivate potential future works in Section 7. All code used in this project is available on github¹.

2 Data↩︎

In this section we describe the observations, DES Y3 Gold catalog (Section 2.1), and synthetic source injection runs, Balrog (Section 2.2), used in our analysis.

2.1 DES DR1 & Y3 GOLD↩︎

Table 1: These are the survey properties used in the analysis, along with their units and the different statistics used. Each property has maps in \(griz\) bands with the exception of *stellar_dens* and *skyvar_sqrt* (which was lacking an \(r\)-band map). Adding up all the statistics and bands across each each quantity gives a total of 92 maps used.
Quantity	Units	Statistics
airmass	—	WMEAN
		MIN
		MAX
fwhm	arcsec	WMEAN
		MIN
		MAX
fwhm_fluxrad	arcsec	WMEAN
		MIN
		MAX
exptime	seconds	SUM
t_eff	—	WMEAN
		MIN
		MAX
t_eff_exptime	seconds	SUM
skybrite	e\(^-\)/CCD pix	WMEAN
skyvar	(e\(^-\)/CCD pix)\(^2\)	WMEAN
		MIN
		MAX
skyvar_sqrt	e\(^-\)/CCD pix	WMEAN
skyvar_uncertainty	e\(^-\)/s\(\cdot\)coadd pix
sigma_mag_zero	mag	QSUM
fgcm_gry	mag	WMEAN
		MIN
stellar_dens	stars/\(\deg^2\)	—

Figure 1: Distributions of two of the survey properties used, stellar density (left) and exposure time sum in the i-band (right). — Figure 1: Distributions of two of the survey properties used, stellar density (left) and exposure time sum in the \(i\)-band (right).

We use data products from the first data release of DES [43] based on three years of observations using the Dark Energy Camera [44] mounted on the Blanco 4m telescope at the Cerro Tololo Inter-American Observatory (CTIO). The survey data covers an area of \(\sim 5000\,\,\text{deg}^2\) in five broadband filters, \(grizY\). With 38,850 total exposures in DES DR1, each location in the footprint typically contains \(4-6\) exposures in each band [45], corresponding to a median depth of \(i\sim 23.3\) at \(S/N = 10\) for unresolved sources.

We also use object classifications, quality flags, and survey property maps from the value-added DES Y3 GOLD release [46]. The survey property maps consist of spatial survey property values stored as HEALPix² [47] pixel (hereafter healpixel) maps. These survey property maps track spatial variations of observing conditions at a high resolution (NSIDE = 4096) and are described in more detail in Section 7.3 and Appendix E of [46]. Table 1 lists the survey properties that are used in this analysis. Examples of survey property maps for stellar density and exposure time in the \(i\)-band are shown in Figure 1.

For the flags, to obtain a high quality sample of objects we base our selection on [39] (see their Section 4) and place the following cuts on all Y3 GOLD objects:

\[\begin{align} &\texttt{FLAGS\_FOREGROUND} = 0\nonumber\\ \text{AND}\;\; & \texttt{FLAGS\_BADREGIONS} < 2 \nonumber\\ \text{AND}\;\; & \texttt{FLAGS\_FOOTPRINT} = 1 \nonumber. \end{align}\]

For the above cuts, the footprint cut is used to select regions which had good coverage in multiple observing bands, the foreground cut is used to remove regions near bright objects, and the bad regions cut is used to remove high densities of anomolous colors and tiles where the multi-object fitting pipeline failed to finish. More details on the above cuts can be found in Section 7.1 and 7.2 of [46].

Object classification is done using single object fitting (SOF) classification with \(\texttt{EXTENDED\_CLASS\_SOF}\), which is the only classification provided within our synthetic source catalog. Stars are defined as objects with \(0 \leq \texttt{EXTENDED\_CLASS\_SOF} \leq 1\), galaxies are objects with \(2 \leq \texttt{EXTENDED\_CLASS\_SOF} \leq 3\).

Object magnitudes are also incorporated in our methodology, in all cases we use the SOF magnitudes which are single-object multi-epoch measurements described in [46]. However, in this work point spread function (\(\texttt{SOF\_PSF\_MAG}\)) magnitudes are used for objects classified as stars, while composite model (\(\texttt{SOF\_CM\_MAG}\)) magnitudes [48] are used for objects classified as galaxies.

2.2 Balrog Synthetic Sources↩︎

To compliment the Y3 GOLD data, the synthetic source catalog generated by Balrog [39], [49] is used. Balrog is a software package that synthetically injects sources into individual DES images before coaddition and processes them with the same DESDM pipelines as observations. Balrog is the name used to refer to both the software and synthetic object catalog, but for the remainder of this work we will use this name only to refer to the catalog. The injected sources were chosen to be empirical populations of artificial galaxies and stars taken from the deeper DES Deep Field observations [50]. Additionally, delta-function stars (delta functions convolved with a local PSF) are injected, which we use for our synthetic stellar sample to avoid any potential galaxy contamination from incorrect Deep Field classifications. These synthetic objects are subject to the same systematic impacts from survey properties as physical objects would be. The 7.4M Balrog objects used for this work (90% Deep Field objects, 10% delta-stars) were injected on a uniform grid on 2,041 randomly chosen tiles out of the 10,338 total Y3 tiles (for a coverage map see Figure 6 of [39], and more information on the injected objects can be found in their Section 3).

In this work, Balrog objects are subject to the same flag cuts as described previously for Y3 GOLD objects. Classification cutoffs in terms of \(\texttt{EXTENDED\_CLASS\_SOF}\) are also the same. Likewise, \(\texttt{PSF}\) magnitudes are used for objects classified as stars while \(\texttt{CM}\) magnitudes are used for objects classified as galaxies. To be consistent with Y3 GOLD objects, measured magnitudes are used for objects instead of true magnitudes. Finally, a cut of \(\texttt{MATCH\_FLAG\_1.5\_ASEC} < 2\nonumber\) is applied, which reduces the risk of ambiguous matching to Y3 GOLD objects within \(1.5\) arcseconds. More details on this flag cut can be found in Section 3.5 of [39]. With this synthetic source injector, we are able to perform corrections on DES data, which we turn to next.

3 Methods↩︎

3.1 Overview and Notation↩︎

Table 2: This is a description of notation and terminology that will be used throughout this paper. Lesser used notation will be defined when used.
Notation	Terminology	Definition
\(\text{\_}_S\)		Stars.
\(\text{\_}_G\)		Galaxies.
I_		Injected objects.
\(T_S\)	True Stars	True number of stars
		in a given area.
\(O_S\)	Observed Stars	Count of detected true
		stars in a given area,
		regardless of given
		classification.
\(C_S\)	Classified Stars	Count of objects
		classified as stars
		in a given area.
\(P\left( C_S\|O_S\right)\)		Probability of giving
		an observed star a
		classification of star.
\(RO_S\)	Recovered	Recovered count for
	Observed Stars	observed stars after
		applying maximum
		likelihood separation.
\(D_R\left( C_S, O_S\right)\)		Relative detection rate
		of observed stars
		classified as stars.
\(FC_S\)	(Final) Corrected	Final star counts after
	Stars	applying corrections.

Spatial variations in survey properties lead to correlated variations in the stellar selection function in two ways: (i) variations in the correct classification rate of detected objects and (ii) variations in the object detection rate. The notation used in this work is summarized in Table 2. In particular, \(T_S\) is the true number of stars, detected and undetected, in any area of the sky. \(O_S\) is the number of these true stars which are detected, and \(C_S\) is the number of objects that are classified as stars (this includes misclassified galaxies).

Our algorithm uses SSI objects and survey property maps to derive a relation between the survey properties and the likelihood that a given object will be detected and classified correctly. This will allow us to take catalog level data (\(C_S\) and \(C_G\)) and correct it based on the survey properties at a given on-sky location. To get this correction, we first use SSI objects to obtain the probability that a detected SSI star/galaxy will be classified correctly (\(P\left( IC_S|IO_S\right)\) and \(P\left( IC_G|IO_G\right)\)). We assume that the probability that a detected star (galaxy) will be classified correctly is the same as that probability for an SSI star (galaxy):

\[\label{eq:prob95match} P\left( C_S|O_S\right) = P\left( IC_S|IO_S\right)\tag{1}\]

These probabilities allow us to estimate the number of true stars (galaxies) that were detected \(RO_S\) (\(RO_G\)) at each position. After obtaining \(RO_S\) and \(RO_G\), we use the synthetic sources to estimate relative detection rates (for correctly classified stars \(D_R\left( C_S, O_S\right)\) is the rate at which true stars \(T_S\) enter the stellar sample \(C_S\) relative to the average rate across the footprint). We find that correctly classified stars are subject to distinct variations in the relative detection rate compared to misclassified galaxies (see Appendix 10), which necessitates the calculation of both for a full correction. Therefore, we calculate four relative detection rates: \(D_R\left( IC_S,IO_S\right)\), \(D_R\left( IC_G,IO_S\right)\), \(D_R\left( IC_S,IO_G\right)\), \(D_R\left( IC_G,IO_G\right)\). As in Eq.(1 ), we assume the relative detection rates for real objects in these groups are the same as for SSI objects.

Classification probabilities and relative detection rates are used to obtain final corrected counts at each position (\(FC_S\) & \(FC_G\)). We present two different types of corrections: one estimates what \(C_S\) would have been with uniform survey properties, and one estimates what \(O_S\) would have been with uniform survey properties. More details are given in Section 3.3.

For large surveys, the stellar magnitude distribution is not uniform over the full footprint. To account for this, we bin all sources based on magnitudes. All probabilities, detection rates, and corrections are calculated for each magnitude bin.

3.2 Probability Calculations↩︎

Figure 2: This figure shows Balrog delta star correct classification rates before (blue) and after (orange) corrections as a function of the effective exposure time sum in the i-band. The green histogram shows the distribution of effective exposure time in the i-band across the DES footprint. Training was performed on an 80% subsample of the Balrog objects with the remaining sources used for testing. As well as magnitude cuts, a color cut to the Phoenix isochrone is made (see Section 4.1). A drop in standard deviation after corrections shows a mitigation of some of the variations caused by this survey property. — Figure 2: This figure shows `Balrog` delta star correct classification rates before (blue) and after (orange) corrections as a function of the effective exposure time sum in the \(i\)-band. The green histogram shows the distribution of effective exposure time in the \(i\)-band across the DES footprint. Training was performed on an 80% subsample of the `Balrog` objects with the remaining sources used for testing. As well as magnitude cuts, a color cut to the Phoenix isochrone is made (see Section 4.1). A drop in standard deviation after corrections shows a mitigation of some of the variations caused by this survey property.

This section outlines the calculation of \(P\left(C_S|O_S\right)\) and \(D_R\left(C_S, O_S\right)\). A more detailed description can be found in Appendices 8 and 9.

As a specific example to illustrate our method, we present \(P\left(IC_S|IO_S\right)\) in our faintest magnitude bin as a function of a single survey property in Figure 2. The green histogram shows the distribution of effective exposure time in the \(i\)-band for DES Y3. The blue (orange) line shows the ratio before (after) corrections of correctly classified SSI stars to all detected SSI stars as a function of the survey property value (Table 1). Since we only care about the variations in this rate, we divide each line by its average to center them at \(1\). For corrections, this ratio of correctly classified SSI stars to all detected SSI stars is computed for all survey properties. We then take the survey property with the largest variation in the ratio and apply an empirical correction. This process is repeated iteratively until one of our termination criteria is met: either \(\sigma < 0.01\) for all survey properties or, to avoid overfitting and/or long runtimes, after 150 iterations (see Section 5.3). To ensure that we are not just removing noise, we take 80% of our SSI stars to be a training sample and the remaining 20% to be a testing sample. All corrections were calculated on the training sample but then applied to the testing sample to make the plot. In this way, the drop in standard deviation from \(\sigma=0.07\) to \(\sigma = 0.02\) indicates that our correction pipeline has reduced the impact variations in this survey property have on the stellar classification rate. We see \(\sigma > 0.01\) after corrections since the test set is independent from the training set.

Relative detection rates are calculated in a similar way as classification probabilities, and details are left in Appendix 8. We find that variations in relative detection rates are larger than variations in classification probabilities. Before corrections, Figure 2 shows that among all bins \(P\left(C_S|O_S\right)\) varies from the average by a maximum of \(\sim 12\)%. For these same survey property bins, \(D_R\left(C_S, O_S\right)\) varies from the average by a maximum of \(\sim 39\)%, so we allow training to continue for 300 cycles before stopping if convergence had not been reached.

3.3 Algorithm Design↩︎

With classification probabilities and relative detection rates established, we now detail the correction algorithms introduced in Section 3.1, which share identical steps except for their final stage.

The first step uses \(P\left(C_S|O_S\right)\), \(P\left(C_G|O_G\right)\), \(C_S\), and \(C_G\) to estimate the number of true stars (galaxies) that were detected, \(RO_S\) (\(RO_G\)). We use a maximum likelihood approach to estimate \(RO_S\) and \(RO_G\), which gives:

\[\label{eq:ROS} RO_S = \frac{C_SP\left(C_G|O_G\right) + C_G\left[ P\left(C_G|O_G\right) - 1\right]}{P\left(C_S|O_S\right) + P\left(C_G|O_G\right) - 1}\tag{2}\]

This value can be negative or larger than \(C_S+C_G\), both of which are not physical. We crop \(RO_S\) to be between zero and \(C_S+C_G\). Using conservation of counts, we get \(RO_G\) by demanding that \(RO_S+RO_G=C_S+C_G\).

All relative detection rates behave distinctly, so each one must be used to make corrections to specific subsets of the objects. For more details, refer to Appendix 10. Due to this, our next step is getting estimates on the number of objects in the groups \(C_S\cap RO_S\), \(C_S\cap RO_G\), \(C_G\cap RO_S\), and \(C_G\cap RO_G\) (this is just the classification distribution for \(RO_S\) and \(RO_G\)), given by:

\[\label{eq:CSROS} C_S\cap RO_S = RO_S P\left(C_S|O_S\right)\tag{3}\]

Other combinations are calculated analogously. Correcting any of these four sets of objects for variable detection rates is done by dividing by their respective relative detection rate, for example: \[\label{eq:corr95CSROS} \left( C_S\cap RO_S\right)_\text{Corr} = \frac{C_S\cap RO_S}{D_R\left( C_S,O_S\right)}\tag{4}\]

Obtaining \(FC_S\) (and \(FC_G\)) is now just a matter of algorithmic design. We could take \(FC_S\) to be the sum of the corrected counts for \(C_S\cap RO_S\) and \(C_S\cap RO_G\) as in Eq.(5 ). This design tries to remove variations induced by survey properties in \(C_S\). Alternatively we could take \(FC_S\) to be the sum of the corrected counts for \(C_S\cap RO_S\) and \(C_G\cap RO_S\) as in Eq.(6 ). This design tries to remove variations induced by survey properties in \(O_S\).

\[\label{eq:FCS} FC_S = \left(C_S\cap RO_S\right)_\text{Corr} + \left(C_S\cap RO_G\right)_\text{Corr}\tag{5}\]

\[\label{eq:FCS2} FC_S = \left(C_S\cap RO_S\right)_\text{Corr} + \left(C_G\cap RO_S\right)_\text{Corr}\tag{6}\]

In this work, we use the first of these algorithms, Eq.(5 ), which is justified in Appendix 11.

4 Correction of the DES Data↩︎

In this section we present the results of applying our correction algorithm to DES data. To select a realistic dataset for this pipeline that will be applicable to a stream, we follow the study of the Phoenix stream by sec. ¿sec:Phoenix?. Phoenix is a \(15^{\circ}\) long, \(0.16^{\circ}\) wide, and dynamically cold stellar stream in the Southern Hemisphere [28], [34], [51]. Phoenix shows density fluctuations on small scales, making it a well-suited candidate for studying potential perturbations. In addition to its intrinsic properties, the stream lies in the middle of the DES footprint, near a prominent survey depth feature at a right ascension (RA) of \(\sim 30\) deg (Figure 1). This feature has increased depth and is one of the most readily apparent survey property features in the entire DES footprint. Therefore, we can use observations of the Phoenix stream to demonstrate the application of our pipeline and its effect on the generation of density maps.

Initially, we define a color-magnitude-based matched filter data selection used to obtain a realistic set of catalog objects. Then, we discuss the adjustments needed to apply our method to this dataset and generate corrections for both stars and galaxies.

4.1 Matched Filter Data selection↩︎

To derive corrections for a given stellar or galactic density map we want to place the same observational selection criteria on the injected objects as we would place on observations. In the case of the Phoenix stream, we use the matched filter from sec. ¿sec:Phoenix?. This filter is described in detail in their analysis, but generally it uses a synthetic isochrone to generate a selection region in color-magnitude space. The size of this region is defined by the expected width of the stellar population convolved with observational uncertainties. We use an isochrone from [52] as implemented in ugali [53], [54] with an age of a \(=12.8\) Gyr, metallcity of [Fe/H] \(=-2.5\), and distance of d \(=17.4\) kpc sec. ¿sec:Phoenix?.

We also use a magnitude limit of \(r < 24.5\), where completeness falls to \(\sim20 \%\) (see [39] Figure 9). This can be compared to [15] which use the same observations (DES Y3) but place a limit around one magnitude brighter (\(g < 23.5\)). It is also \(0.3\) mag fainter than the \(r < 24.2\) limit used in sec. ¿sec:Phoenix? which uses DES Y6 data (roughly twice as many observations as DES Y3).

As mentioned in Section 2.1, in our correction algorithm we apply corrections to different magnitude bins. Our magnitude bins are as follows: \(r \leq 22.9\), \(22.9 < r \leq 23.9\), \(23.9 < r \leq 24.5\). These magnitude bins were chosen as they had similar relative detection rate variations in testing sets when these variations were extrapolated out to expected values for a more current Balrog run. For more details, refer to Appendix 12. These selections were placed in addition to the ones described in Section 2.

4.2 Algorithm Adjustments for Real Data↩︎

Due to noise concerns, when applying to observational data, we perform the corrections at a healpixel resolution of NSIDE = 512. A FRACDET map [46] is used for a weighted resolution degradation and for a first order correction to counts. We crop to healpixels that have a FRACDET value \(>\) 0.5.

In this work, Deep Field object classifications are taken as the truth. By performing spatial matching between the DES Deep Fields and Wide Fields and looking at the classifications given in both cases, we are able to get true correct classification rates. When comparing these rates against the classification rates in that area based on our training we found systematic errors. To correct this we multiplied our classification probabilities by scalars to match the probabilities shown by the deep fields.

4.3 DES Corrections↩︎

Figure 3: Star counts before and after corrections are applied. The matched filter described in Section 4.1 is used to crop for color and magnitude. Masked regions are primarily masked due to bright foreground objects, which is also true for future plots. For more physical units, at the above NSIDE = 512 each pixel has an area of \sim 47.2 \text{ arcmin}^2. Notable improvements are the suppression of the depth feature at RA \sim 30^\circ and the better continuity at the edge of the footprint, both of which are more obvious in the galaxy plot below. — Figure 3: Star counts before and after corrections are applied. The matched filter described in Section 4.1 is used to crop for color and magnitude. Masked regions are primarily masked due to bright foreground objects, which is also true for future plots. For more physical units, at the above `NSIDE` = 512 each pixel has an area of \(\sim 47.2 \text{ arcmin}^2\). Notable improvements are the suppression of the depth feature at RA \(\sim 30^\circ\) and the better continuity at the edge of the footprint, both of which are more obvious in the galaxy plot below.

Figure 4: Galaxy counts before and after corrections are applied.

Figure 5: The effective weight map (corrected / uncorrected), smoothed with a 0.15^{\circ} kernel, for stellar objects in our faintest magnitude bin. This map is used for the validation tests in Section 5. — Figure 5: The effective weight map (corrected / uncorrected), smoothed with a \(0.15^{\circ}\) kernel, for stellar objects in our faintest magnitude bin. This map is used for the validation tests in Section 5.

In Figure 3 we show the observed stellar map on the left and our corrected map on the right. For all plots in this section we applied a Gaussian smoothing with a kernel of \(0.15^{\circ}\). The Phoenix isochrone was used to select the objects in all plots in this section. The stellar density map is non-uniform on large scales, with counts increasing near the Galactic plane (along the left and right edges of the footprint) and near the Sagittarius stream located at RA \(\sim 30^{\circ}\) and declination (Dec.) \(\sim -5^{\circ}\). As well as these physical variations, features correlated with survey properties (such as the RA \(\approx30^{\circ}\) vertical stripe and the artifacts at the edge of the footprint) are present in the initial map and mitigated in the corrected map.

Galaxy counts can provide a probe as to how well the algorithm is working since they should be uniform on large scales across the DES footprint. Figure 4 shows initial counts on the left and the corrected map on the right. As with stars, in the original galaxy map a number of features correlated with survey properties can be seen. These include large scale variations in the average number of objects (e.g., lower right of map), vertical stripes at constant RA (e.g., RA \(\approx 30^{\circ}\)) due to increased depth, and artifacts at the edge of the footprint. The corrected map does not exhibit these features and is more uniform, indicating the observational selection function has been mitigated.

Before applying these corrections to the Phoenix stellar stream specifically, we validate our methodology (Section 5). For our tests, we construct an effective weight map for stellar objects by taking corrected divided by original counts in Figure 3. The smoothed effective weight map is shown in Figure 5 with the boundary removed as it suffers from additional systematics. Bins of this map are used to sample a large range of impacts on stars from survey properties.

5 Validation of Methodology↩︎

This section tests the accuracy of the probabilities calculated in Section 3. We test the total corrective power of our relative detection rates (Section 5.1), how this changes with the number of objects used to train (Section 5.2), and how consistent our final corrections are as a function of the number of objects used for training (Section 5.3).

5.1 Overall Corrective Power↩︎

Figure 6: Detection rates relative to the average for a 20% testing subset of the Balrog objects before and after corrections. Before corrections is shown in blue and after corrections in orange. The plot titles describe which relative detection rates are being shown. A drop in variance like observed shows that selection effects are being mitigated. — Figure 6: Detection rates relative to the average for a 20% testing subset of the `Balrog` objects before and after corrections. Before corrections is shown in blue and after corrections in orange. The plot titles describe which relative detection rates are being shown. A drop in variance like observed shows that selection effects are being mitigated.

Initially we test the calculations of \(D_R\left( C_S,O_S\right)\) and \(D_R\left( C_S,O_G\right)\) in our faintest magnitude bin since these are the two relative detection rates necessary for a stellar correction. We focus on relative detection rates as they have larger variations than classification probabilities. Assuming uniform injections, the relative detection rate of a type of object (such as true stars classified as stars) is calculated as the number of detected objects of this type per healpixel divided by the overall average. If the corrections are effective we would expect this ratio to become independent of the amount of correction applied (effective weight).

Subsets of the data are created by binning the healpixels based on their effective weight from Figure 5, to obtain 10 bins of equal sized samples. From each area bin, we select 80% of the Balrog objects to be in our training sample which is used to calculate corrections applied to the remaining test set. Figure 6 shows relative detection rates on the test set across the effective weight bins. Comparing rates before (blue) and after (orange) corrections shows that variations in relative detection rates for both stars and galaxies drop by a factor of \(\sim 5\) .

5.2 Convergence from Object Counts↩︎

Figure 7: The convergence of corrections as more objects are used for training. A constant 20% subset of Balrog objects are withheld for testing. The remaining 80% is then cropped to different levels. The 0% level indicates that no corrections had been performed. Three area bins are tested. Blue, orange, and green lines represent area bins with above average detection rates, average detection rates, and below average detection rates respectively. Red dotted lines show an exponent fit to an outer envelope. The plot titles describe which relative detection rates are being shown. — Figure 7: The convergence of corrections as more objects are used for training. A constant 20% subset of `Balrog` objects are withheld for testing. The remaining 80% is then cropped to different levels. The 0% level indicates that no corrections had been performed. Three area bins are tested. Blue, orange, and green lines represent area bins with above average detection rates, average detection rates, and below average detection rates respectively. Red dotted lines show an exponent fit to an outer envelope. The plot titles describe which relative detection rates are being shown.

Characterizing how corrective power scales with more SSI objects will help inform SSI strategies for future analyses. To investigate this, we perform the test from Section 5.1 while varying the size of the training sample. Due to noise concerns, we reduce the number of effective weight bins to 3. Our results are shown in Figure 7 where we plot the detection rates for correctly classified stars (left) and incorrectly classified galaxies (right) as a function of the size of the training dataset. Blue (orange, green) points represent an effective weight bin that started with above (approximately, below) average relative detection rates. To illustrate the general behavior we include an exponential decay envelope in red. This line shows convergence is reached with \(\sim\)40% of the training set in both cases. Beyond this point we are limited by the Poisson noise of the test dataset and cannot probe further convergence.

5.3 Repeatability of Results↩︎

Figure 8: Left: standard deviation of the effective weight map residuals as a function of percentage of training objects used. Right: residuals of the total effective weight map for two runs using 50% of the Balrog objects. — Figure 8: Left: standard deviation of the effective weight map residuals as a function of percentage of training objects used. Right: residuals of the total effective weight map for two runs using 50% of the `Balrog` objects.

Given a training set, the derived corrections are completely deterministic, but it is interesting to see how these derived corrections change as the training sample is varied. This is important because it shows the inherent noise in our corrections as a function of training set size.

To test this, we vary the objects used to derive corrections. At a fixed training set size, we create two disjoint groups of training objects. For each subset, we derive corrections and effective weight maps. We then compute the difference in effective weight, and the variance of this residual gives an estimate of our algorithm’s repeatability. This process is repeated for nine training set sizes ranging from 10-50% of the total sample (\(\sim 26\) million total objects). Residuals for the two 50% runs are shown in the right plot of Figure 8. The left plot of Figure 8 shows standard deviations of the residuals as a function of the training set size. We found our results could be well fit by a power law \(y \sim x^{-0.45}\).

With Balrog Y6 covering the entire DES footprint instead of just 20% [55], we can use our fit to predict the consistency of results when \(\sim 5\) times more objects are injected. In this case, we expect two full runs of Balrog Y6 to have stellar effective weight map residuals with a standard deviation of \(\sim 0.03\), compared to \(\sim 0.06\) for Balrog Y3. Other differences in Balrog Y6 mentioned in [39] and [56] such as no longer using delta stars will impact the accuracy of this prediction.

Residuals can also be used to validate our cycle limit for convergence as mentioned in Section 3. We compare our calculated effective weight map to an effective weight map where we put no limit on the number of cycles in the iterative training. The standard deviation for the residuals of these maps was 0.003, showing that the cycle limit we have chosen has allowed results to converge.

6 Application to Stellar Streams↩︎

This section applies the corrections from Section 4 to observations of stellar streams. First we look at the Phoenix stellar stream and show that our corrections cause statistically significant linear density shifts (6.1). We then look at synthetic streams and show that our correction algorithm properly suppresses artificial power at length scales larger than the DECam field of view (FOV) (6.2) of 3 sq. deg. This suppression of power is detectable on individual stream realizations at length scales relevant for subhalo interactions [57].

6.1 Phoenix Stellar Stream↩︎

Figure 9: Phoenix stellar stream before and after corrections are applied. A gaussian smoothing with a kernel of 0.15^\circ is used to smooth these maps. The (RA, Dec.) endpoints in degrees of this stream are (20.1, -55.3) and (27.9, -42.7) [51]. — Figure 9: Phoenix stellar stream before and after corrections are applied. A gaussian smoothing with a kernel of \(0.15^\circ\) is used to smooth these maps. The (RA, Dec.) endpoints in degrees of this stream are (20.1, -55.3) and (27.9, -42.7) [51].

Figure 10: Linear densities of stars per degree of Phoenix minus background. Results are shown for the uncorrected stream, the corrected stream, and the uncorrected stream with the more conservative magnitude cut used in sec. ¿sec:Phoenix?. Statistical uncertainties on densities are given by errorbars. \phi_1 is the angular stream track coordinate. Whether or not these changes are beneficial will be addressed later, for now we just note that the changes are statistically significant. — Figure 10: Linear densities of stars per degree of Phoenix minus background. Results are shown for the uncorrected stream, the corrected stream, and the uncorrected stream with the more conservative magnitude cut used in sec. **¿sec:Phoenix?**. Statistical uncertainties on densities are given by errorbars. \(\phi_1\) is the angular stream track coordinate. Whether or not these changes are beneficial will be addressed later, for now we just note that the changes are statistically significant.

Taking the matched-filter map from Section 4.1 we show the original (top) and corrected (bottom) counts for the on-sky region around the Phoenix stream in Figure 9. It is immediately obvious that the depth feature along RA \(\sim 27^\circ\) is removed in the corrected map, enhancing the signal of the stellar stream. For a quantitative measurement of this effect, we compute the number of excess stars along the stream relative to the background.

We select a background region along the stream but offset by \(1^\circ\). Then we define the linear density as the star counts on stream minus the background. The results of this are shown in Figure 10. The blue (orange) line signifies the stream before (after) corrections, and the green line signifies the stream with a more conservative (\(r\leq 24.2\)) magnitude cut from sec. ¿sec:Phoenix?. Errorbars represent the statistical uncertainty of each point, showing that the corrections made to the Phoenix stellar stream lead to statistically significant changes in the linear density of the stream. We note that this is not meant as a direct comparison to sec. ¿sec:Phoenix? as they use DES Y6 data, but instead is meant to show the dependence of linear density on magnitude cutoffs.

The two most prominent changes as a result of these corrections are the reduced “gap" centered at \(\phi_1 \sim -0.5\) deg and and the fluctuations seen at the local peak centered around \(\phi_1 \sim -3\) deg. Conversely, the density peak at \(\phi_1 \sim 1.5\) deg, hump at \(2 < \phi_1 < 6\) deg, and under-density at \(\phi_1 \sim -5\) remain generally the same in all cases. This analysis also suggests that it would be interesting to apply this correction technique to the deeper Y6 data analyzed in sec. ¿sec:Phoenix?. In the next section, we turn to tests on the power spectra of simulated streams to assess the impact of these corrections.

6.2 Synthetic Stellar Streams↩︎

Figure 11: Power spectra for synthetic streams. Shown are the baseline for the true stream (red), the observed stream before corrections (blue), and the observed stream after corrections (orange). We also show the baseline and corrected streams with a more conservative r<23.9 cut, denoted with dashed lines. The true stream has perfect classification and no variations in detection rates. Solid lines represent the median power spectra at each point for 5000 realizations. Shaded regions represent the 16th and 84th percentiles among the 5000 simulations for each point along the power spectra. Power spectra are shown for a uniform density stream on the left and a sinusoidal density stream, with a period of 4.4^\circ (two times the DECam FOV), on the right. — Figure 11: Power spectra for synthetic streams. Shown are the baseline for the true stream (red), the observed stream before corrections (blue), and the observed stream after corrections (orange). We also show the baseline and corrected streams with a more conservative \(r<23.9\) cut, denoted with dashed lines. The true stream has perfect classification and no variations in detection rates. Solid lines represent the median power spectra at each point for 5000 realizations. Shaded regions represent the \(16\)th and \(84\)th percentiles among the 5000 simulations for each point along the power spectra. Power spectra are shown for a uniform density stream on the left and a sinusoidal density stream, with a period of \(4.4^\circ\) (two times the DECam FOV), on the right.

To probe the effect of our corrections on the 1D density auto-correlation function, we compute this power spectra for 5000 simulated streams before and after corrections. These simulated streams are injected into an arbitrary region of the sky at high Galactic latitude with large variations in effective weight (the area is fixed between realizations). For each realization, we assume a uniform distribution for the true number of stars and galaxies plus a stream population of stars. Since we don’t know the absolute detection rate, we fix the number of detected objects for each realization. We then use relative detection rates to create realistic spatial distributions for all of our mocks. Our stream is defined to, on average, have a 25% excess of stars compared to the background stellar population. Corrections are then performed on the data based on the relative detection rates and correct classification probabilities as prescribed in Section 3. For more details, refer to Appendix 11.

To compare these uncorrected and corrected results to a baseline, we independently do a run with perfect classification and no variations in detection rates. Two runs were performed, one with a uniform density stream, and one with a sinusoidal density stream with period \(4.4^\circ\), two times the DECam FOV. In Figure 11 we show the results of this exercise. The lines are the median power spectra for baseline (red), corrected (orange), and uncorrected (blue) results. Shaded regions represent the \(16\)th and \(84\)th percentiles among the 5000 simulations for each point along the power spectra. For the uniform density stream (left), the corrected stream obtains a near constant power spectra while the uncorrected stream obtains extra power at scales larger than the DECam FOV. For the sinusoidal density stream (right), the corrected stream does gain signal power at the expected scales, although its amplitude is lower due to the contamination of galaxies. The differences from the uncorrected stream can be seen even on individual runs, and as is shown in [57], these are the same angular scales expected to be sensitive to dark matter subhalo interactions. Dashed lines in Figure 11 represent results with a more conservative magnitude cut of \(r<23.9\). Noise levels increase as expected, and in the sinusoidal case the signal to noise ratio of the peak is reduced by a factor of \(10\%\) compared to the full \(r<24.5\) data, showing the advantage of using fainter objects.

We can estimate the importance of proper corrections at even fainter magnitudes where the ratio of galaxies to stars becomes even larger. This is particularly relevant for future surveys such as the Vera C. Rubin Observatory Legacy Survey of Space and Time [40]. To test this we use the DES deep fields to obtain a realistic distribution of object counts at fainter magnitudes (\(24.5<r\leq 25\)). Assuming survey properties will have similar impacts as in DES, we can get a lower bound for detection and classification variations by setting them equal to the variations from our previous faintest bin (\(23.9<r\leq 24.5\)). Running the same power spectrum test as before, the multiplicative difference between the baseline and uncorrected power spectra increased by a factor of 5 at the largest tested scales, showing that corrections will be even more necessary to fully leverage LSST-like data.

7 Conclusion↩︎

Accurate stellar stream density measurements offer a promising probe into dark matter substructure in the Milky Way. While variable selection effects from survey properties can introduce artificial density variations, synthetic source injection can be used to correct for these effects. In this work we describe and offer a method to correct for artificial density variations that will allow for a more accurate characterisation of stellar streams and their parameters.

Section 4 shows our corrections applied to DES DR1 data. After these corrections were applied, galaxy counts became more uniform on large scales. Features correlated with survey properties such as the depth feature at RA \(\sim 30^\circ\) and edge artifacts are also visually diminished after corrections.

Validation of our algorithm is performed in Section 5. For overall corrective power, we find that key relative detection rates in our faintest magnitude bin had their standard deviations drop by a factor of at least five after corrections. Running this same test with a variable training set size showed that relative detection rates improved in uniformity when more training objects were used, but only up to a limit. Finally, we find that the repeatability of our results improved as more training objects were used. These improvements persisted over the entire range of training set sizes that we were able to test.

We apply our corrections to stellar streams in Section 6. Here we find that the changes in the linear density of Phoenix due to our corrections are statistically significant, which calls into question some of the apparent density variations reported in the literature. For uniform density simulated streams, our corrections mitigated all signals from survey property variations. For sinusoidal density simulated streams, our corrections saw a signal at the proper frequency, although the amplitude of the corrected signal was reduced.

One clear next step is to derive corrections for the DES Y6 run of Balrog [55]. This SSI run has injections across the entire DES footprint compared to the 20% coverage for Balrog Y3. One large advantage of this will come in the form of testing set size for the tests performed in Section 5. The number of sources in a 20% testing set of the Balrog Y6 data would be on the order of the entire Balrog Y3 catalog. This will make Balrog Y6 less susceptible to potential bias in estimates of relative detection rates that are more likely to arise with smaller sample sizes. This could lead to the relative detection rates in Figure 7 moving closer to one in the future if the current limits on accuracy were due to biased testing sets. Additionally, this framework could easily be applied to data from the DECam Local Volume Exploration survey (DELVE; [58]) processed through the DECam All Data Everywhere (DECADE; [59]) campaign. This data set uses the same instrument with the same data reduction pipeline, but contains much more heterogeneous survey properties. An interesting problem to investigate is how well these corrections transfer to other color-magnitude regions. Our color-magnitude cuts were specific to the isochrone of Phoenix. If possible, less restrictive color cuts would allow one training session to potentially provide corrections for multiple stellar streams with different isochrones, and allow for more objects to be used in training. The potential cost would come from how well the Balrog sources would actually model the stars in the stellar stream if they’re in a different area of color-magnitude space.

This work demonstrates that the use of synthetic sources to correct for variable selection effects is a viable approach. It also provides a groundwork methodology for applying these corrections in future surveys such as the LSST, which will cover the DES footprint (and extend beyond) and use a similar photometric system. Rudimentary tests show that we can expect the impacts of survey properties to increase by a factor of at least 5 when fainter magnitude bins are used in LSST. Therefore, accurate corrections and a deep understanding of the observational selection function will be even more important in making unbiased measurements of density variations in stellar streams.

Acknowledgments↩︎

PSF acknowledges support from the DIRAC Institute in the Department of Astronomy at the University of Washington. The DIRAC Institute is supported through generous gifts from the Charles and Lisa Simonyi Fund for Arts and Sciences, and the Washington Research Foundation.

Contribution statement: KB conducted the analysis, created all plots and tables presented here and led the writing. PSF provided direct supervision of the research, guidance on the analysis and helped with writing. MT provided technical expertise in accessing and correctly using the SSI data products. KBechtol guided major analysis and content decisions as well as co-supervising the research. ADW and CMV internally reviewed the paper. TYC and BMP provided valuable comments that improved the paper’s clarity and quality. The remaining authors contributed to this work through the construction of DECam and other aspects of data collection; data processing and calibration; developing widely used methods, codes, and simulations; running pipelines and validation tests; and promoting the science analysis.

Funding for the DES Projects has been provided by the U.S. Department of Energy, the U.S. National Science Foundation, the Ministry of Science and Education of Spain, the Science and Technology Facilities Council of the United Kingdom, the Higher Education Funding Council for England, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, the Kavli Institute of Cosmological Physics at the University of Chicago, the Center for Cosmology and Astro-Particle Physics at the Ohio State University, the Mitchell Institute for Fundamental Physics and Astronomy at Texas A&M University, Financiadora de Estudos e Projetos, Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro, Conselho Nacional de Desenvolvimento Científico e Tecnológico and the Ministério da Ciência, Tecnologia e Inovação, the Deutsche Forschungsgemeinschaft and the Collaborating Institutions in the Dark Energy Survey.

The Collaborating Institutions are Argonne National Laboratory, the University of California at Santa Cruz, the University of Cambridge, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas-Madrid, the University of Chicago, University College London, the DES-Brazil Consortium, the University of Edinburgh, the Eidgenössische Technische Hochschule (ETH) Zürich, Fermi National Accelerator Laboratory, the University of Illinois at Urbana-Champaign, the Institut de Ciències de l’Espai (IEEC/CSIC), the Institut de Física d’Altes Energies, Lawrence Berkeley National Laboratory, the Ludwig-Maximilians Universität München and the associated Excellence Cluster Universe, the University of Michigan, NSF NOIRLab, the University of Nottingham, The Ohio State University, the University of Pennsylvania, the University of Portsmouth, SLAC National Accelerator Laboratory, Stanford University, the University of Sussex, Texas A&M University, and the OzDES Membership Consortium.

Based in part on observations at NSF Cerro Tololo Inter-American Observatory at NSF NOIRLab (NOIRLab Prop. ID 2012B-0001; PI: J. Frieman), which is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation.

The DES data management system is supported by the National Science Foundation under Grant Numbers AST-1138766 and AST-1536171. The DES participants from Spanish institutions are partially supported by MICINN under grants PID2021-123012, PID2021-128989 PID2022-141079, SEV-2016-0588, CEX2020-001058-M and CEX2020-001007-S, some of which include ERDF funds from the European Union. IFAE is partially funded by the CERCA program of the Generalitat de Catalunya.

We acknowledge support from the Brazilian Instituto Nacional de Ciência e Tecnologia (INCT) do e-Universo (CNPq grant 465376/2014-2).

This document was prepared by the DES Collaboration using the resources of the Fermi National Accelerator Laboratory (Fermilab), a U.S. Department of Energy, Office of Science, Office of High Energy Physics HEP User Facility. Fermilab is managed by Fermi Forward Discovery Group, LLC, acting under Contract No. 89243024CSC000002

Some of the results in this paper have been derived using the healpy and HEALPix package

8 Calculating Classification Probabilities and Relative Detection Rates↩︎

This section will discuss the details in calculating classification probabilities and relative detection rates. The calculations are nearly identical, so more focus will be given to calculating classification probabilities. The notation used will be the same as was given in Table 2. As additional notation, \(SP\) will refer to a general survey property.

8.1 Classification Probabilities↩︎

This section will focus on calculating \(P\left(C_S|O_S\right)\) for some magnitude bin using Balrog data. The calculation of \(P\left(C_G|O_G\right)\) is completely analogous after switching out the Balrog delta star sample for the Balrog galaxies. For this subsection, valid healpixels will refer to healpixels with valid values for each survey property which also have a detected Balrog delta star within the magnitude bin of interest.

After applying measured magnitude cuts, two numbers are stored for each valid healpixel. First is the number of detected Balrog delta stars in the healpixel, \(IO_S\). Second is the number of correctly classified Balrog delta stars in the healpixel, \(IC_S \cap IO_S\). Both of these counts are subjected to quality and magnitude cuts. The sum over all valid healpixels of \(IC_S \cap IO_S\) divided by the sum of \(IO_S\) can be thought of as the average correct classification probability:

\[\label{eq:ave95classification} \langle P\left(IC_S|IO_S\right)\rangle = \frac{\sum IC_S \cap IO_S}{\sum IO_S}\tag{7}\]

Next we compute the relation between each survey property and the correct classification probability. For each survey property, the valid healpixels are binned according to the survey property. For this analysis, we choose to use 10 bins. In each bin the average survey property value is calculated in each bin as well as the relative correct classification rate compared to the full DES footprint. These are calculated using Eqs.(8 ) and (9 ) respectively. Here, summing over a bin means taking the sum over the healpixels within the bin.

\[\label{eq:ave95sp} \langle SP\rangle_{\text{Bin}} = \frac{\sum_{\text{Bin}} SP}{\sum_{\text{Bin}} 1}\tag{8}\]

\[\label{eq:rel95class} \frac{\langle P\left(IC_S|IO_S\right)\rangle_{\text{Bin}}}{\langle P\left(IC_S|IO_S\right)\rangle} = \frac{\left(\sum_{\text{Bin}} IC_S \cap IO_S\right) / \left(\sum_{\text{Bin}} IO_S\right)}{\langle P\left(IC_S|IO_S\right)\rangle}\tag{9}\]

These ten ordered pairs show the impact the survey property has on classification probabilities. With this dependency in mind for each survey property individually, we have to construct the classification probability as a function of every survey property. This is done in an iterative process. First we select whichever survey property causes the most variance in \(P\left(IC_S|IO_S\right)\) among its bins. For the \(\text{n}^\text{th}\) iteration, we will refer to this survey property as \(SP_\text{Max,n}\). Once this survey property is chosen, we correct for the dependency it causes. This is repeated iteratively until a termination condition is met. Notably, there is nothing preventing a correction for one survey property introducing a dependency in another survey property that has already been corrected for. Due to this, the same survey property could be corrected for multiple times during the training procedure. For example, we could very well have \(SP_\text{Max,1} = SP_\text{Max,50}\).

To begin training, we want to figure out which survey property to correct for first. Eq.(9 ) is calculated for each bin of each survey property and we use a least squares deviation from one to determine which survey property has the largest impact on classification rates. This is shown in expression (10 ) which is calculated for each survey property.

\[\label{eq:class95metric} \sum_{\text{Bin} = 1}^{10} \left(\frac{\langle P\left(IC_S|IO_S\right)\rangle_{\text{Bin}}}{\langle P\left(IC_S|IO_S\right)\rangle} - 1\right)^2\tag{10}\]

Using the notation defined earlier, we will let \(SP_\text{Max,1}\) be whichever survey property has the largest value for expression (10 ). With this survey property chosen, we can use the ten ordered pairs defined in eqs.(8 ) and (9 ) to construct a function from \(SP_\text{Max,1}\) values to a relative classification rate. To achieve this we use a linear interpolation function \(f_1\) defined such that the following holds for each bin:

\[\label{eq:inter95func} f_1\left( \langle SP_\text{Max,1}\rangle_{\text{Bin}}\right) = \frac{\langle P\left(IC_S|IO_S\right)\rangle_{\text{Bin}}}{\langle P\left(IC_S|IO_S\right)\rangle}\tag{11}\] We note that a constant value is used for extrapolation.

To correct the Balrog data for the dependency on \(SP_\text{Max,1}\) we analyze \(f_1\left( SP_\text{Max,1}\right)\) for each valid healpixel with inputs of the \(SP_\text{Max,1}\) values on each healpixel. The output of \(f_1\) can be interpreted as how likely a star is to be classified correctly, for a given healpixel relative to the average correct classification rate. Therefore to remove the \(SP_\text{Max,1}\) dependency, at each valid healpixel the number of correct classifications is divided by the output of the interpolation function. This is shown explicitly in Eq.(12 ), where each value shown is assumed to be the value on one specific healpixel.

\[\label{eq:class95correction} \left( IC_S \cap IO_S\right)_\text{New} = \frac{\left( IC_S \cap IO_S\right)_\text{Old}}{f_1\left( SP_\text{Max,1}\right)}\tag{12}\]

Substituting these new values on each valid healpixel to Eq.(7 ), we can start this process over to make another correction for a survey property dependency. This process is repeated until values from Eq.(9 ) falls between \(0.99\) and \(1.01\) for each bin for each survey property. For notation purposes, let the survey property and interpolation function used on the \(\text{n}^\text{th}\) iteration be denoted as \(SP_\text{Max,n}\) and \(f_\text{n}\) respectively. As mentioned briefly in Section 2.2, training is stopped after 150 cycles even if convergence has not been reached to avoid over-fitting data.

With training completed, \(P\left(IC_S|IO_S\right)\) must now be calculated for all healpixels with valid survey properties so it can be applied to data. This is done in an iterative process. To start, we assign each of these healpixels the average classification rate, shown in Eq.(13 ). We note that \(\langle P\left(IC_S|IO_S\right)\rangle\) in this case will refer to the initial value before any corrections were made to \(IC_S\cap IO_S\) which could change the value slightly. In practice, the specific starting point is not important as the average is re calibrated (see 4.2).

\[\label{eq:init95class} P_0\left(IC_S|IO_S\right) = \langle P\left(IC_S|IO_S\right)\rangle\tag{13}\]

After this, for each correction to \(IC_S \cap IO_S\) applied during training in Eq.(12 ), we must apply a corresponding correction to our classification rate. For concreteness, suppose there were a total of \(\text{N}\) cycles used in training. Then, for \(\text{n}\) between \(1\) and \(\text{N}\) we define the recursive relation given in Eq.(14 ), which is calculated on each healpixel. Once all iterations have been completed, we have our overall correct classification probability. Combining that with the relation in Eq.(1 ) gives us Eq.(15 ).

\[\label{eq:iter95class} P_\text{n}\left(IC_S|IO_S\right) = P_{\text{n} - 1}\left(IC_S|IO_S\right) f_\text{n}\left( SP_\text{Max,n}\right)\tag{14}\]

\[\label{eq:final95class} P\left(C_S|O_S\right) = P_\text{0}\left(IC_S|IO_S\right) \Pi_{\text{n} = 1}^{\text{N}} f_\text{n}\left( SP_\text{Max,n}\right)\tag{15}\]

To avoid a non physical probability, this is cropped to be less than \(1\) (negative values will not appear by construction).

8.2 Relative Detection Rates↩︎

In this section we will calculate four relative detection rates as functions of survey properties for some particular magnitude bin: the rates at which Balrog delta stars are classified as stars \(D_R\left( IC_S,IO_S\right)\), Balrog delta stars are classified as galaxies \(D_R\left( IC_G,IO_S\right)\), Balrog galaxies are classified as stars \(D_R\left( IC_S,IO_G\right)\), and Balrog galaxies are classified as galaxies \(D_R\left( IC_G,IO_G\right)\). For now we focus specifically on \(D_R\left( IC_S,IO_S\right)\) but the same process is repeated for the other relative detection rates. Valid healpixels in this section will refer to healpixels with valid values for each survey property which also had a Balrog delta star injected which passed quality cuts. Compared to the classification rates of the previous subsection, with relative detection rates we are unable to get an average detection rate for our objects. This is due to the fact that undetected objects do not have measured magnitudes to bin based on. Therefore, we are restricted to calculating relative detection rates instead of actual detection probabilities.

The calculation of these rates is performed in a similar manner as in Section 5.3.1 of [60]. Within a bin from some survey property, we look at the average number of detections of interest (Balrog delta stars classified as stars within our magnitude bin) per healpixel in that bin to find a characteristic detection density. This is then compared to the average number of detections of interest per healpixel across all of the valid healpixels. If the distribution of objects that could cause a detection of interest is uniform on large scales, the ratio of these two densities will then represent the rate at which detections of interest occur in the survey property bin relative to the rate at which detections of interest occur overall. Since Balrog objects are injected on a uniformly spaced hex grid and the distribution of objects on this grid is random, this approach is valid for our calculations.

With this framework in mind, we turn now to the details of the actual calculation. Like in classification calculations, for each survey property we bin the valid healpixels according to the survey property and calculate the relative detection rate as the following:

\[\label{eq:rel95det} \langle D_R\left(IC_S, IO_S\right)\rangle_{\text{Bin}} = \frac{\left(\sum_{\text{Bin}} IC_S \cap IO_S\right) / \left(\sum_{\text{Bin}} 1\right)}{\left(\sum IC_S \cap IO_S\right) / \left(\sum 1\right)}\tag{16}\]

Note that the numerator is exactly the average number of detections of interest per healpixel in the bin while the denominator is the average number of detections of interest per healpixel across all valid healpixels.

Treating the value calculated in Eq.(16 ) like the relative classification rates of the previous subsection, training is then performed. As mentioned briefly in section 2.2, training is stopped after 300 cycles even if convergence has not been reached.

Once training is complete, we must calculate \(D_R\left( IC_S,IO_S\right)\) on all healpixels with valid survey properties. This is also performed in a nearly identical way to the calculations in the previous subsection, with the only change being that we start with the assignment:

\[\label{eq:init95det} D_{R,0}\left( IC_S,IO_S\right) = 1\tag{17}\]

This is done since our our relative detection rate should by definition be centered on 1 instead of an average probability like the classification probability. With this assignment done, the recursive relation used to build up to \(D_R\left( IC_S,IO_S\right)\) is defined in an identical way as in Eq.(14 ). As a relative rate can exceed \(1\), the crop done in the previous section is unnecessary here.

9 Testing Separability Assumption↩︎

Figure 12: (Left) Example of 2D distributions of objects in inseparable and separable cases before and after corrections. (Right) A weighted distribution of the metric improvement value, weighted by the pre-correction metric. Any value over zero is considered improvement, and a one would be perfect removal of any dependency. Every metric improvement value was greater than 0.65, with the vast majority being above 0.95. — Figure 12: (Left) Example of 2D distributions of objects in inseparable and separable cases before and after corrections. (Right) A weighted distribution of the metric improvement value, weighted by the pre-correction metric. Any value over zero is considered improvement, and a one would be perfect removal of any dependency. Every metric improvement value was greater than \(0.65\), with the vast majority being above \(0.95\).

Figure 13: Above is a sample 2D relation using exposure time sum in the r band and skyvariance uncertainty in the g band. For the test 20% of Balrog objects, first we plot out the healpixel distribution in these two dimensions, shown on the left. With this healpixel distribution, we can use original and corrected detection counts to look at detection rate variations before corrections (in the middle) and after corrections (on the right). — Figure 13: Above is a sample 2D relation using exposure time sum in the r band and skyvariance uncertainty in the g band. For the test 20% of `Balrog` objects, first we plot out the healpixel distribution in these two dimensions, shown on the left. With this healpixel distribution, we can use original and corrected detection counts to look at detection rate variations before corrections (in the middle) and after corrections (on the right).

When calculating relative detection rate and classification probability variations, our iterative approach means that the selection function we learn is separable. If the selection function is not separable, we can still get convergence in training without properly learning the selection function. This could occur since convergence is defined by looking at the projections of the selection function onto each survey property individually. Higher dimensional dependencies could be hidden from view when these projections occur, which would lead to converging to the wrong selection function. To demonstrate this, simulations were run with injecting objects uniformly over two survey properties, \(x\) and \(y\). In one simulation, detection rates were given by a separable function of \(x\) and \(y\), while in another the detection rate function was inseparable. Results before and after corrections are shown in the left of Figure 12, showing the higher dimensional structure present post corrections in the inseparable case.

To test whether this occurred in our pipeline, we used the largest set of Balrog objects possible: all Balrog galaxy injections. To keep the statistical significance as high as possible, we trained and corrected for overall relative detection rates (being classified as either a star or a galaxy). Training was done on an 80% subset of objects, with testing being done on the remaining 20%. For each combination of two survey properties we looked at the distribution of valid healpixels over the two survey properties, and the relative detection rates before and after corrections over the two survey properties. An example of this is shown in Figure 13. Ideal results would be uniform distributions in the corrected plot, especially in regions with high healpixel counts as these are the most common throughout the footprint.

With this example shown, we next designed a metric to concisely summarize results over every combination of survey properties. Overall, both the original and corrected two dimensional histograms should have values centered about one. Furthermore, the points we care about more are the points with high healpixel densities, as this will impact a greater portion of the DES footprint. To take these factors into account, we use a metric of variance from one weighted by healpixel count. If we let \(i\) index over each two dimensional bin in the histograms from Figure 13, let \(PC_i\) be the healpixel counts in bin \(i\), and let \(D_{R,i}\) be the detection rate variation in bin \(i\) (these are the histogram values of the two rightmost histograms of Figure 13), we write our metric as:

\[\label{eq:2d95metric} \epsilon = \frac{\sum_i PC_i\left(1 - D_{R,i}\right)^2}{\sum_i PC_i}\tag{18}\]

This metric will be non-negative by construction, and better results will lead to values closer to zero. Using these features we then define an improvement metric. If for any given survey property combination \(\epsilon_{\text{orig}}\) is the metric value on the original detection variations and \(\epsilon_{\text{corr}}\) is the metric value on the corrected detection variations, we define out improvement metric as:

\[\label{eq:met95impr} \epsilon_{\text{impr}} = 1 - \frac{\epsilon_{\text{corr}}}{\epsilon_{\text{orig}}}\tag{19}\]

Any positive value for \(\epsilon_{\text{impr}}\) suggests an improvement, with a value of one being perfect. We calculated \(\epsilon_{\text{impr}}\) values for every combination of survey properties using the 20% testing data from earlier. We do not necessarily care about each \(\epsilon_{\text{imp}}\) value equally, so weights of \(\epsilon_{\text{orig}}\) are also stored for each survey property combination. If two survey properties hardly had any detection rate dependencies in their 2D relation to begin with their improvement matters far less than two survey properties who initially had high dependencies. With these weights in mind, a weighted histogram of \(\epsilon_{\text{impr}}\) was generated, as is shown in the right plot of Figure 12. The overwhelming trend towards 1 shows that our separable assumption is reasonable.

10 Relative Detection Rate Correlations↩︎

In Figure 14 we show the distributions of two different relative detection rate combinations across the DES footprint. For both plots we focus on relative detection rates in our faintest magnitude bins. The plot on the left shows the relative detection rates of stars detected as stars vs galaxies detected as stars. In general this plot shows that the relative detection rate of galaxies classified as stars has a higher variance, and there is a clear positive correlation. This correlation is likely due to the fact that galaxies classified as stars appear to be point sources and so survey properties that affect detection rates of stars will affect these galaxies as well. The plot on the right shows the relative detection rates of stars detected as stars vs stars detected as galaxies. This plot does not show strong correlations between the two relative detection rates. The absence of a correlation implies that the detection rate of stars that are then classified as galaxies is not affected by survey properties in the same manner. One possible explanation for this could be that unrecognized blends are causing brighter stars to be classified as galaxies, and therefore higher changes in survey properties will not affect the detection rate of these objects.

Overall, the scatter in these relations eliminates the possibility of certain simplifications that could have potentially been made to our correction algorithm. Referring back to Section 3, our current correction algorithm, originally given in Eq.(5 ), is given by:

\[\label{eq:app95FCS} FC_S = \frac{C_S\cap RO_S}{D_R\left( C_S,O_S\right)} + \frac{C_S\cap RO_G}{D_R\left( C_S,O_G\right)}\tag{20}\]

If the left plot of Figure 14 showed that the two relative detection rates used above were equal, we could simplify that correction to be:

\[\label{eq:app95FCS95simplify} FC_S = \frac{C_S}{D_R\left( C_S,O_S\right)} = \frac{C_S}{D_R\left( C_S,O_G\right)}\tag{21}\]

If this were the case, there would be no need to calculate classification probabilities since we would not need \(RO_S\) or \(RO_G\). As well as this, we would only need to calculate one of \(D_R\left( C_S,O_S\right)\) or \(D_R\left( C_S,O_G\right)\).

The correction method that we did not wind up using, originally given in Eq.(6 ), is given by:

\[\label{eq:app95FCS2} FC_S = \frac{C_S\cap RO_S}{D_R\left( C_S,O_S\right)} + \frac{C_G\cap RO_S}{D_R\left( C_G,O_S\right)}\tag{22}\]

If the right plot of Figure 14 showed that the two relative detection rates used above were equal, we could simplify this correction in a similar fashion as in Eq.(21 ). The scatter in both plots shows that neither correction algorithm can be simplified, so all relative detection rates must be calculated.

Figure 14: Each plot is using relative detection rates for our faintest magnitude bin. The dotted black line shows where the two rates are equal. (Left) Relative detection rates of stars detected as stars vs galaxies detected as stars. (Right) Relative detection rates of stars detected as stars vs stars detected as galaxies.

11 Algorithm Design Tests↩︎

In Section 3, we presented two different correction algorithms: one to estimate what the number of objects classified as a stars would have been with uniform survey properties (we will refer to it as the recovered classified algorithm), and one to estimate what the true number of observed stars would have been with uniform survey properties (recovered observed). The recovered classified algorithm was chosen for use throughout this work. This section will discuss tests performed on the two algorithms that were used in the making of that decision.

Each test we run uses the same modeling sandbox to generate the counts we analyze. For an area on which to test, we initially chose a \(3\times3\) square of 32 resolution healpixels centered at a position of RA\(=30^{\circ}\), Dec.\(=-35^{\circ}\). This area was chosen since it contains the depth feature at RA\(\approx30^{\circ}\). This area then has its resolution increased to 512 (giving 2304 total healpixels), since this is the resolution at which we apply our corrections to real data. A subset of these healpixels (which is test dependent) is chosen to hold the simulated stellar stream.

Applying our correction pipeline to this area of sky with real data gives us estimates for the counts in the area for objects of type \(O_S\cap C_S\), \(O_G\cap C_S\), \(O_S\cap C_G\), and \(O_G\cap C_G\) for each magnitude bin we use. Injections for each of these four object types occur randomly on healpixels, and detection probabilities for injections are made to be proportional to the relative detection rate for the type of object being injected. This continues until we reach our calculated counts for each type of object. For our simulated stream, an excess percentage of \(O_S\cap C_S\) and \(O_S\cap C_G\) type objects are then injected and then cropped to the stellar stream healpixels. After sampling, \(O_S\cap C_S\) and \(O_G\cap C_S\) counts are summed to get uncorrected star counts while \(O_S\cap C_G\) and \(O_G\cap C_G\) counts are summed to get uncorrected galaxy counts.

For our first test, the different correction algorithms were tested on their ability to recover the power spectrum of a stellar stream. For this test the overall size of the sandbox was increased to a \(3\times9\) rectangle of 32 resolution healpixels before being increased to a resolution of 512. 432 healpixels were used for the stream (\(3\times144\)) to make a stream that was approximately \(17^{\circ}\) long, slightly longer than Phoenix. An excess stream percentage over the background of 25% was used for each of the 5000 realizations. As a baseline to compare to, 5000 realizations were done with uniform detection rates and no misclassification. For each of the 5000 realizations, a density power spectrum was calculated. The median of these power spectra for each algorithm is shown in the left plot of Figure 15. The underlying baseline result is shown in red, uncorrected in blue, recovered classified in orange, and recovered observed in green. Both correction algorithms remove most structure from the power spectra, but the recovered classified algorithm results in a median power spectrum that has lower noise than the recovered observed algorithm.

The next test we ran looked at dependencies of stellar stream density as a function of the different relative detection rate variations. For this test, a random 230 of the total 2304 healpixels are chosen to host the stream (the structure of the simulated stream is unimportant for this test). The excess stream percentage was chosen to be 25%. After both correction algorithms are applied, we bin the stellar stream healpixels into 10 bins in terms of each of the 12 relative detection rates used (4 types of relative detection rates in 3 magnitude bins). With this, we check average star counts in each bin and compare it to the average number of stars in the background. An ideal result would be a flat curve (showing no detection rate dependency) at a value of 1.25 (since 25% extra stars were injected). Results from of this test are displayed in the right plot of Figure 15. In this plot we look at the stream excess over background as a function of the relative detection rate of correctly classified stars in our faintest magnitude bin. Uncorrected results are shown in blue, recovered classified in orange, and recovered observed in green. For our selected stream healpixels relative detection rates were lower than average which causes the \(x\) axis to be centered below 1. The recovered classified curve can be seen to have lower variance while the recovered observed curve is closer to 1.25.

Both of these tests show the increase in noise when using the recovered observed algorithm. For our application in particular of looking at stellar stream density fluctuations, the power spectrum test is the most useful since power spectra can be used for dark matter constraints. In the power spectrum test in particular, the recovered classified algorithm matched the baseline stream to a higher degree than the recovered observed algorithm. This was our reasoning for using the recovered classified algorithm. It should be noted that for structured streams, if the power of the structure exceeds the noise from the recovered classified algorithm, then the recovered algorithm gives a more accurate value for the power. This could motivate a combination of the two algorithms being used in certain applications.

Figure 15: (Left) Median power spectra over 5000 realizations for a baseline stream (red), uncorrected data (blue), and corrected data with each correction algorithm (orange and green). The recover classified counts algorithm shows lower noise levels than the recover observed counts algorithm. (Right) One dimensional dependencies of stream density from the relative detection rate of correctly classified stars in our faintest magnitude bin (23.9<r\leq 24.5). 5000 realizations are performed, and the averages are plotted with standard deviation error bars. Both corrected lines show decreased variance. — Figure 15: (Left) Median power spectra over 5000 realizations for a baseline stream (red), uncorrected data (blue), and corrected data with each correction algorithm (orange and green). The recover classified counts algorithm shows lower noise levels than the recover observed counts algorithm. (Right) One dimensional dependencies of stream density from the relative detection rate of correctly classified stars in our faintest magnitude bin (\(23.9<r\leq 24.5\)). 5000 realizations are performed, and the averages are plotted with standard deviation error bars. Both corrected lines show decreased variance.

12 Determining Magnitude Bins↩︎

Figure 16: (Left) Average maximum deviations for relative detection rate variations for Balrog galaxies. Higher object counts allowed for probing at lower percentages of the Balrog sample. (Right) Average maximum deviations for relative detection rates variations for Balrog stars classified as stars in the faintest magnitude bin used. — Figure 16: (Left) Average maximum deviations for relative detection rate variations for `Balrog` galaxies. Higher object counts allowed for probing at lower percentages of the `Balrog` sample. (Right) Average maximum deviations for relative detection rates variations for `Balrog` stars classified as stars in the faintest magnitude bin used.

Object magnitude plays a role in detection rate and classification probability variations, so to address this in this work we binned our objects based on magnitudes. As mentioned in Section 4.1, a faint end limit on the \(r\)-band magnitude was placed at 24.5. Figure 9 of [39] claims that detection rates of objects at this magnitude were around 20%, so we decided not to push fainter due to shot noise concerns with such low completion.

When binning based on magnitude, general areas of magnitude space with similar properties made sense to be grouped together. As a general example, bright objects will have nearly perfect detection and classification rates, so that entire region of magnitude space will have very similar corrections. When detection rates start to fall, this argument no longer applies and finer binning is necessary. However, with a limited number of Balrog objects to split among the magnitude bins, more bins leads to less statistical significance for calculated rates. We settled on three bins in an effort to have a magnitude bin for bright objects, a bin for faint objects where detection rates were significantly lower, and a transition bin between those two extremes.

To determine the magnitude bins used, we focused on how effective they would be in Balrog Y6 where we assumed we would have access to 5 times as many objects. Since relative detection rates have larger variations than relative classification probabilities, we focused on them for constructing a metric to analyze our magnitude bins. Once a magnitude bin was chosen and we had the appropriate Balrog objects, various percentage crops would be applied to simulate different injection counts. We split this smaller sample into an 80% training sample and a 20% testing sample. Training for relative detection rates was performed, and we viewed dependencies from each survey property on relative detection rates within the 20% testing sample. Each survey property would have one out of the ten bins as the furthest away from 1. For this bin we stored the deviation of the relative detection rate from 1 to get an idea of the maximum impact that survey property would have on relative detection rates. We then averaged this value over all survey properties to get a general idea of the deviations one would expect to be caused by survey properties. This average value will be referred to as the average maximum deviation.

We found that this average maximum deviation as a function of percentage of Balrog objects used could be modeled well by a power-law function, which we would then extrapolate out to 500% to get an idea of accuracy expected from Y6. For relative detection rates, when calculating stellar counts we were only interested in the rates at which stars are classified as stars and galaxies are classified as stars. Galaxies classified as stars are uncommon at bright magnitudes, so our correction for this relative detection rate is not as consistently important across magnitude bins as our correction for stars classified as stars. Due to this, we focused on the relative detection rate of stars classified as stars for our work. Magnitude bins were tested to get similar average maximum deviations for the relative detection rate of stars classified as stars across all three magnitude bins. When using magnitude bins of \(r \leq 22.9\), \(22.9<r\leq 23.9\), and \(23.9<r\leq 24.5\), we found that the average maximum deviation would take values of 2.2%, 2.4%, and 2.5% respectively when extrapolated to Y6 object counts. An example of this average maximum deviation as a function of Balrog object percentage used is shown in Figure 16. Also shown is a plot generated from detection rates for Balrog galaxies classified as either stars or galaxies. This was the largest possible sample of objects that could be used for such a test, which made it useful for determining the form of the fitting function as we could probe lower object percentages.

References↩︎

[1]

Bullock, J. S. &Boylan-Kolchin, M. 2017, http://dx.doi.org/10.1146/annurev-astro-091916-055313, arXiv:1707.04256.

[2]

Buckley, M. R. &Peter, A. H. G. 2018, http://dx.doi.org/10.1016/j.physrep.2018.07.003, arXiv:1712.06615.

[3]

Chabanier, S., Millea, M., &Palanque-Delabrouille, N. 2019, http://dx.doi.org/10.1093/mnras/stz2310, arXiv:1905.08103.

[4]

Bechtol, K., Birrer, S., Cyr-Racine, F.-Y., et al. 2022, http://dx.doi.org/10.48550/arXiv.2203.07354, arXiv:2203.07354.

[5]

Jethwa, P., Erkal, D., &Belokurov, V. 2018, http://dx.doi.org/10.1093/mnras/stx2330, arXiv:1612.07834.

[6]

Nadler, E. O., Wechsler, R. H., Bechtol, K., et al. 2020, http://dx.doi.org/10.3847/1538-4357/ab846a, arXiv:1912.03303.

[7]

Newton, O., Leo, M., Cautun, M., et al. 2021, http://dx.doi.org/10.1088/1475-7516/2021/08/062, arXiv:2011.08865.

[8]

Dekker, A., Ando, S., Correa, C. A., &Ng, K. C. Y. 2022, http://dx.doi.org/10.1103/PhysRevD.106.123026, arXiv:2111.13137.

[9]

Treu, T. 2010, http://dx.doi.org/10.1146/annurev-astro-081309-130924, arXiv:1003.5567.

[10]

Vegetti, S., Birrer, S., Despali, G., et al. 2024, http://dx.doi.org/10.1007/s11214-024-01087-w, arXiv:2306.11781.

[11]

Newberg, H. J. &Carlin, J. L. 2016, Tidal Streams in the Local Group and Beyond, Vol. 420 (Springer).

[12]

Odenkirchen, M., Grebel, E. K., Rockosi, C. M., et al. 2001, http://dx.doi.org/10.1086/319095, arXiv:astro-ph/0012311.

[13]

Bernard, E. J., Ferguson, A. M. N., Schlafly, E. F., et al. 2014, http://dx.doi.org/10.1093/mnrasl/slu089, arXiv:1405.6645.

[14]

Grillmair, C. J. 2017, http://dx.doi.org/10.3847/1538-4357/aa8872, arXiv:1708.09029.

[15]

Shipp, N., Drlica-Wagner, A., Balbinot, E., et al. 2018, http://dx.doi.org/10.3847/1538-4357/aacdab, arXiv:1801.03097.

[16]

Malhan, K., Ibata, R. A., &Martin, N. F. 2018, http://dx.doi.org/10.1093/mnras/sty2474, arXiv:1804.11339.

[17]

Li, T. S., Koposov, S. E., Zucker, D. B., et al. 2019, http://dx.doi.org/10.1093/mnras/stz2731, arXiv:1907.09481.

[18]

Mateu, C. 2023, http://dx.doi.org/10.1093/mnras/stad321, arXiv:2204.10326.

[19]

Bonaca, A. &Price-Whelan, A. M. 2025, http://dx.doi.org/10.1016/j.newar.2024.101713, arXiv:2405.19410.

[20]

Bovy, J., Erkal, D., &Sanders, J. L. 2017, http://dx.doi.org/10.1093/mnras/stw3067, arXiv:1606.03470.

[21]

Küpper, A. H. W., Kroupa, P., Baumgardt, H., &Heggie, D. C. 2010, http://dx.doi.org/10.1111/j.1365-2966.2009.15690.x, arXiv:0909.2619.

[22]

Erkal, D., Belokurov, V., Bovy, J., &Sanders, J. L. 2016, http://dx.doi.org/10.1093/mnras/stw1957, arXiv:1606.04946.

[23]

Bonaca, A., Hogg, D. W., Price-Whelan, A. M., &Conroy, C. 2019, http://dx.doi.org/10.3847/1538-4357/ab2873, arXiv:1811.03631.

[24]

Banik, N., Bovy, J., Bertone, G., Erkal, D., &de Boer, T. J. L. 2021, http://dx.doi.org/10.1093/mnras/stab210, arXiv:1911.02662.

[25]

Banik, N., Bovy, J., Bertone, G., Erkal, D., &de Boer, T. J. L. 2021, http://dx.doi.org/10.1088/1475-7516/2021/10/043, arXiv:1911.02663.

[26]

Delos, M. S. &Schmidt, F. 2022, http://dx.doi.org/10.1093/mnras/stac1022, arXiv:2108.13420.

[27]

Lu, J., Lin, T., Sholapurkar, M., &Bonaca, A. 2025, http://dx.doi.org/10.48550/arXiv.2502.07781, arXiv:2502.07781.

[28]

Balbinot, E., Yanny, B., Li, T. S., et al. 2016, http://dx.doi.org/10.3847/0004-637x/820/1/58.

[29]

Wan, Z., Lewis, G. F., Li, T. S., et al. 2020, http://dx.doi.org/10.1038/s41586-020-2483-6, arXiv:2007.14577.

[30]

Erkal, D., Koposov, S. E., &Belokurov, V. 2017, http://dx.doi.org/10.1093/mnras/stx1208, arXiv:1609.01282.

[31]

Koposov, S. E., Belokurov, V., Li, T. S., et al. 2019, http://dx.doi.org/10.1093/mnras/stz457, arXiv:1812.08172.

[32]

Bonaca, A., Pearson, S., Price-Whelan, A. M., et al. 2020, http://dx.doi.org/10.3847/1538-4357/ab5afe, arXiv:1910.00592.

[33]

Li, T. S., Koposov, S. E., Erkal, D., et al. 2021, http://dx.doi.org/10.3847/1538-4357/abeb18, arXiv:2006.10763.

[34]

Tavangar, K., Ferguson, P., Shipp, N., et al. 2022, http://dx.doi.org/10.3847/1538-4357/ac399b.

[35]

Patrick, J. M., Koposov, S. E., &Walker, M. G. 2022, http://dx.doi.org/10.1093/mnras/stac1478, arXiv:2206.04241.

[36]

Shipp, N., Li, T. S., Pace, A. B., et al. 2019, http://dx.doi.org/10.3847/1538-4357/ab44bf, arXiv:1907.09488.

[37]

Bonaca, A., Naidu, R. P., Conroy, C., et al. 2021, http://dx.doi.org/10.3847/2041-8213/abeaa9, arXiv:2012.09171.

[38]

Li, T. S., Ji, A. P., Pace, A. B., et al. 2022, http://dx.doi.org/10.3847/1538-4357/ac46d3, arXiv:2110.06950.

[39]

Everett, S., Yanny, B., Kuropatkin, N., et al. 2022, http://dx.doi.org/10.3847/1538-4365/ac26c1.

[40]

Ivezić, Z., Kahn, S. M., Tyson, J. A., et al. 2019, http://dx.doi.org/10.3847/1538-4357/ab042c.

[41]

Tsiane, K., Mau, S., Drlica-Wagner, A., et al. 2025, http://dx.doi.org/10.33232/001c.142072.

[42]

Rodrı́guez-Monroy, M., Weaverdyck, N., Elvin-Poole, J., et al. 2022, http://dx.doi.org/10.1093/mnras/stac104, arXiv:2105.13540.

[43]

Abbott, T. M. C., Abdalla, F. B., Allam, S., et al. 2018, http://dx.doi.org/10.3847/1538-4365/aae9f0.

[44]

Flaugher, B., Diehl, H. T., Honscheid, K., et al. 2015, http://dx.doi.org/10.1088/0004-6256/150/5/150.

[45]

Diehl, H. T., Neilsen, E., Gruendl, R., et al. 2016, http://dx.doi.org/10.1117/12.2233157, 99101D.

[46]

Sevilla-Noarbe, I., Bechtol, K., Carrasco Kind, M., et al. 2021, http://dx.doi.org/10.3847/1538-4365/abeb66.

[47]

Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, http://dx.doi.org/10.1086/427976,.

[48]

Abazajian, K., Adelman-McCarthy, J. K., Agüeros, M. A., et al. 2004, http://dx.doi.org/10.1086/421365.

[49]

Suchyta, E., Huff, E. M., Aleksić, J., et al. 2016, http://dx.doi.org/10.1093/mnras/stv2953.

[50]

Hartley, W. G., Choi, A., Amon, A., et al. 2021, http://dx.doi.org/10.1093/mnras/stab3055.

[51]

Shipp, N., Drlica-Wagner, A., Balbinot, E., et al. 2018, http://dx.doi.org/10.3847/1538-4357/aacdab.

[52]

Bressan, A., Marigo, P., Girardi, L., et al. 2012, http://dx.doi.org/10.1111/j.1365-2966.2012.21948.x.

[53]

Bechtol, K., Drlica-Wagner, A., Balbinot, E., et al. 2015, http://dx.doi.org/10.1088/0004-637x/807/1/50.

[54]

Drlica-Wagner, A., Bechtol, K., Mau, S., et al. 2020, http://dx.doi.org/10.3847/1538-4357/ab7eb9.

[55]

Anbajagane, D., Tabbutt, M., Beas-Gonzalez, J., et al. 2025, http://dx.doi.org/10.33232/001c.138627, arXiv:2501.05683.

[56]

Tabbutt, M. 2023, https://asset.library.wisc.edu/1711.dl/MO5YXZ3SVLRZW8G/R/file-fdda5.pdf.

[57]

Banik, N., Bovy, J., Bertone, G., Erkal, D., & de Boer, T. J. L. 2021, http://dx.doi.org/10.1093/mnras/stab210.

[58]

Drlica-Wagner, A., Carlin, J. L., Nidever, D. L., et al. 2021, http://dx.doi.org/10.3847/1538-4365/ac079d, arXiv:2103.07476.

[59]

Anbajagane, D., Chang, C., Zhang, Z., et al. 2025, http://dx.doi.org/10.48550/arXiv.2502.17674, arXiv:2502.17674.

[60]

Rodríguez-Monroy, M., Weaverdyck, N., Elvin-Poole, J., et al. 2022, http://dx.doi.org/10.1093/mnras/stac104.

Robust Measurement of Stellar Streams Around the Milky Way: Correcting Spatially Variable Observational Selection Effects in Optical Imaging Surveys