Abstract

JWST’s exquisite data have opened the doors to new possibilities in detecting broad classes of astronomical objects, but also to new challenges in classifying those objects. In this work, we introduce SESHAT, the Stellar Evolutionary Stage Heuristic Assessment Tool for the identification of Young Stellar Objects, field stars (main sequence through asymptotic giant branch), brown dwarfs, white dwarfs, and galaxies, from any JWST observation. This identification is done using the machine learning method XGBoost to analyze thousands of rows of synthetic photometry, modified at run-time to match the filters available in the data to be classified. We validate this tool on real data of both star-forming regions and cosmological fields, and find we are able to reproduce the observed classes of objects to a minimum of 80% recall across every class, without additional information on the ellipticity or spatial distribution of the objects. Furthermore, this tool can be used to test the filter choices for JWST proposals, to verify whether the chosen filters are sufficient to identify the desired class of objects. SESHAT is released as a Python package to the community for general use.

1 Introduction↩︎

The James Webb Space Telescope (JWST) has provided exquisite data across hundreds of observations in its first four and a half years [1], [2]. With 39 different photometric filters, there are millions of combinations possible to tune observations for each science case. Thus, every JWST observation has the potential to use an entirely unique set of filters. It is important to be able to accurately identify the population of interest within these data, be these brown dwarfs in a cosmological field, or Young Stellar Objects (YSOs) in a star-forming region. In this work, we present the Stellar Evolutionary Stage Heuristic Assessment Tool (SESHAT)¹: a tool for the identification of YSOs, brown dwarfs, field stars, white dwarfs, and galaxies across any JWST observation..

Traditional methods of object classification require the use of discriminating cuts in color and magnitude space to distinguish between different populations [3], [4]. This methodology is excellent for producing homogeneous samples across observations, when every observation uses the exact same suite of filters. In the context of JWST, when every observation has the potential to use its own unique suite of filters, such consistent data selection no longer applies, and new color cuts would need to be defined for every observation. Furthermore, traditional methods are limited to two dimensions for the definition of various cuts, and this limitation can lead to significant impurity of the assigned classification. In the era of big data where thousands of sources are being discovered in every single image, a robust method for the identification of objects is important.

Machine learning can overcome these multidimensional challenges. For instance, it has been used quite successfully in recent years to expedite the search for and recovery of YSOs in spatially massive catalogs [5], [6], including to identify multiple evolutionary stages of stars within these catalogs [7]. Machine learning can appraise every dimension possible of the input data, and determine the best cuts to separate out each type of object, in a fraction of the time that a human could do the same, and still obtain a higher degree of purity in its classifications.

This paper is presented in self-contained sections, such that the reader can focus only on that which is necessary to them. We use synthetic models in order to have data in every filter, and the explanation of how these data are procured and made realistic is presented in Section 2. To analyze these data in any suite of filters, we use an XGBoost machine learning method, and the details of how this algorithm is trained and examples of verification against real Spitzer and JWST data (with pre-defined classifications from the literature) are presented in Section 3. Section 4 describes in detail the intended use of the tool built upon this algorithm, which we call SESHAT, including its inputs, outputs, and availability. We end with our conclusions in Section 5.

2 Data↩︎

One of the reasons that JWST data can be so rich is the sheer variety of filters onboard. With 29 NIRCam filters and 9 MIRI filters, JWST can sample snippets of the spectra of stars never before imaged. What this means, however, is that old data cannot be used to leverage classifications for all the options available. For instance, Spitzer had four near-infrared (IR) filters and one mid-IR, and though there are filters on JWST that have similar bandpasses to these (e.g., IRAC1 and F356W, IRAC2 and F444W, IRAC3 and F560W, IRAC4 and F770W), there are many more besides. To take full advantage of JWST, it is important to be able to rely on classifications that go beyond the Spitzer-analog bands. SESHAT is built to have flexibility in what filters are used and, consequently, we can apply the same method to any well-characterized near-IR facility in the past, present, or future.

For this work, we utilize synthetic models to inform the classifications of objects in data from JWST and Spitzer/2MASS. We use the latter to compare the performance of SESHAT to previously classified datasets. In the following sections, we outline each model grid used, and, when available, the observations these were checked against. We use only colors of objects to eliminate the need for accurate distance estimations to each source. We obtain Vega magnitude information for each source either from the model packages themselves, or, in cases where only spectra are supplied, by convolving the spectra with the filter’s spectral response function, for each filter on JWST, Spitzer, and 2MASS using the SEDFitter tool [8].

2.1 YSO models↩︎

Figure 1: *Top left*: the SED of a Class 0 YSO is shown outlined in black, on top of a sampling of other possible YSO SEDs of varying evolutionary stage in light purple, from the [9] model grid. *Bottom left*: filter response functions for NIRCam (left of dashed line) and MIRI (right of dashed line), as indicated by light-gray filled curves. Wavelengths where the filters overlap thus have darker shading. *Right*: a color-color diagram of the YSOs in our training sample (without additional extinction applied), to show their distribution in color space. The x and y limits of the plot are chosen to be the same for this panel across the following figures, for ease of comparison..

YSOs have a wide variety of spectral energy distribution shapes influenced by many different factors (see Figure 1 for an example). For this work, we use the Hyperion models of [9] to form the basis of our YSO set. These models vary over parameters such as age, metallicity, rotation, envelope size, envelope mass, viewing angle, disk radius, cavity angle, and more. These models follow on the radiative transfer work of Robitaille and colleagues [8], [10], [11], updating those earlier models to include more parameters. Furthermore, they also provide a range of apertures to help with spectral fitting, determining how much of the light around the source (and surrounding disk/envelope) would actually be observed.

To ensure that these models match realistic distributions, we use the Spitzer Extended Solar Neighborhood Archive (SESNA; R. A. Gutermuth et al. in prep) catalogs of the three nearest clouds [12] as well as two more distant and active regions [12]. We fit models to these SESNA YSOs, and use JWST filter convolutions of the model SEDs as the basis of our training set.

Figure 1 shows the wide variations in SEDs that a YSO may have. In this case, the shift in evolutionary status is what leads to this variety, as the YSO accretes or disperses its envelope/disk. This wide shift from being far-/mid-IR dominated to near-IR dominated leads to the broad range seen in the color-color panel of Figure 1.

2.2 Field star models↩︎

Figure 2: *Left*: A model SED of the stellar atmosphere of a MS star from [13] is shown in yellow, outlined in black for ease in visualization, and the colored lines show “connect-the-dots” SEDs from PARSEC [14] CMD photometry points of JWST data from across stellar evolutionary stages (colors denoted by colorbar on far right). *Right*: Color-color diagram showing the field star population, with color scale indicating the evolution from MS through to AGB star..

Stars undergo a complex and dynamical evolution as mediated by their shifting nucleosynthesis. We use the Padova PARSEC models [14] of stellar evolution to sample ages (from \(\log{(t_*)} = 6.0-10.0\), in steps of \(\log{(t_*)} = 0.25\)) and metallicities (\([M/H]\) from \(-2\) to \(0.3\) in steps of \([M/H] = 0.1\)) to define multiple stellar populations. The Kroupa IMF [15], corrected for unresolved binaries, is used to fill out the mass function. We use the PARSEC CMD tool to extract magnitudes in the medium, wide, and very-wide filters for NIRCam and MIRI.

These models provide the populations of objects in their relative abundance at each age. Thus, when building our dataset for this work, we purposefully oversample later evolutionary stages to build representative samples that fully cover all stages. Figure 2 shows the variation of the SEDs across these stages, along with their positions in color-color space. Many field stars are distinctive by their peak in visible wavelengths, and, for the majority of cases, a very flat tail, operating as near perfect black bodies. They are thus relatively easy to separate in color-color space, also seen in Figure 2. The dusty circumstellar envelopes associated with more advanced evolutionary stages (such as AGB stars) leads to the departure from this otherwise condensed relation.

2.3 Brown dwarf models↩︎

Figure 3: Similar to Figs. 1 and 2, now for brown dwarfs from the models of the ATMO collaboration [16]..

Brown dwarf stars never reach the main sequence, and straddle the line between planets and stars [17]. In JWST data, several searches have been performed to isolate brown dwarfs serendipitously detected in cosmological datasets, where low-metallicity brown dwarfs exist out in the halo of our Galaxy [18].

For these objects, we utilize the brown dwarf evolutionary models of the ATMO team [16]. These models come from time evolution of 1D MHD simulations of brown dwarfs. We use all available cases: assuming both chemical equilibrium and non-equilibrium, the latter with both strong and weak vertical mixing. This results in a sample of brown dwarfs with masses ranging from 0.001 \(M_\odot\) to 0.075 \(M_\odot\), with temperatures \(T_{eff}=200-3000~K\) and specific gravities \(\log{(g)}=2.5-5.5\) (\(g\) in cgs units). The strong absorption features present in the atmospheres of brown dwarfs lead to distinctive tails in color-color space, as seen in Figure 3.

Finally, although these models have pre-determined Vega magnitudes available for all medium and wide band filters across NIRCam and MIRI, we convolve our own, and use the match between our convolved magnitudes and theirs to verify that our method is sound. We further convolve Spitzer and 2MASS fluxes, for use in our validation step. Spitzer/2MASS models are not available with the ATMO release.

2.4 White dwarf models↩︎

Figure 4: Similar to Figs. 1, 2, and 3, now for white dwarfs, using the models of [19]..

White dwarfs make up 7-10% of the local stellar population volume, as observed with Gaia [20]. These objects have SEDs that peak in the optical, and decrease throughout the IR, but their signatures can still be observed. With JWST, these stars will only be observed as foreground stars when looking at Galactic star forming regions. For instance, assuming a limiting magnitude of 21 mags at 4.5 microns [21], a typical white dwarf [22] would be detectable up to a distance of 550 pc with no extinction [12].

Most white dwarf models do not extend into the near- to mid-IR due to their sharply decreasing SEDs [23]. [19] created a set of models that continue into the MIR, and we use an extension of these in this work. Specifically, we use models that have been fit to the nearest 400 white dwarfs to ensure broad coverage of all spectral variations (Simon Blouin, private communication). These models include SEDs with absorption features for hydrogen, helium, and carbon, as well as for the variations of metals due to the absorption of exoplanets [24], [25]. An example of a model and variations seen in white dwarf SEDs are presented in Figure 4, along with the (lack of) distribution of these models in color-color space. Most of the variations are within the optical range of the SED, and so this population has little variation in the IR, and thus little variation in the color-color distribution.

2.5 Galaxy models↩︎

Figure 5: Similar to Figs. 1, 2, 3, and 4, now for galaxies using CIGALE [26]–[28]. These galaxies are at various redshifts, with differing star formation histories, presence or absence of AGN, metallicities, and so forth, based on the parameters in [29] and [30]..

JWST has surpassed expectations in its ability to image high-redshift galaxies. Galaxies, by nature, are ubiquitous throughout every field of view. They also have extreme variations across their evolution, from the star-forming main sequence, to star-bursting, those with strong AGN components, to red and dead galaxies. These variations leave distinct fingerprints in their SEDs. In our work, we have clumped these all together under the title of “galaxy” in order to keep our focus on the identification of different stellar evolutionary stages (YSO through white dwarf). Still, it is important to characterize these variations, as they cause galaxies to span a large range in color-color space, as shown in Fig. 5.

To account for the myriad variations of galaxy SEDs, we utilize the models of the CIGALE group [26]–[28], which account for stellar and nebular emission, dust absorption, IR re-emission by dust, star formation histories, metallicities, length of starburst, redshift, as well as AGN parameters such as optical depth, radial gradient of dust density, and polar angle effects, to name a few. We use the parameters from Table 2 of [30] for the low redshift (z=0-2) galaxies, and the parameters from “Phase 4” in Table 2 of [29] for high redshift (z=2-10). To ensure a realistic sample of galaxies, we match the CIGALE-created galaxy models to a cross-matched catalog (Chris Willott, private communication) from two JWST surveys: the Systematic Mid-infrared Instrument Legacy Extragalactic Survey [31], [32], and the JWST Advanced Deep Extragalactic Survey (JADES; Chris Willott, private communication). We then compare color-color diagrams to ensure that the population of JADES/SMILES data is appropriately recreated. In total, we create 10 000 models in this method.

Figure 5 shows an example of a galaxy model SED, along with the variations one may see in such SEDs due to different processes, and the distribution of our training set of galaxies in color-color space. In the case of galaxies, the variations in possible environment are directly reflected in those seen across color-color space. The presence or lack of AGN, of star forming signatures, the 4000 Åbreak, PAH signatures, etc. all contribute to a vast diversity in possible SEDs and thus wide extent across color-color space.

2.6 Realistic Effects↩︎

Up until this point, the data we have been discussing are entirely synthetic, without realistic effects. When taking into account realistic effects, one must consider the scenario at hand. For instance, when observing a star-forming region, it is important to take into account the presence of dust grains within molecular clouds, causing the dimming and reddening of light (extinction) from background sources. As well, Polycyclic Aromatic Hydrocarbon (PAH) molecules are present and excited in star-forming regions, and can also imprint their signatures on the measured starlight from background sources. Conversely, when considering a cosmological field, neither extinction nor PAH emission may be relevant, and the inclusion of those effects could be detrimental to the accurate identification of objects in such fields that do not have those signatures.

In the following sections, we explain how each realistic effect is added, with the caveat that their presence depends on the specific use case.

2.6.1 Extinction Application↩︎

Extinction, caused by the absorption of light on dust grains, is an inherently wavelength-dependent process; blue light will be preferentially absorbed or scattered, red light less so. Certain molecules also lead to specific absorption features. The level of extinction within a cloud is thus commonly defined based on a single bandpass. In this work, we consider \(A_V\), or visual-band extinction.

As we will be considering the identification of objects embedded within dusty molecular clouds, it is critical to account for extinction. We thus apply extinction to each object from the above models, where the degree of extinction applied is the product of two probability distributions. After determining the extinction of a given object, we determine the its effect on each wavelength/frequency band of the object’s SED.

Firstly, we model the extinction of a molecular cloud as a log-normal distribution with an extended wing towards high density [33]. In particular, we use a log-normal function with scale \(m = 0\) and dispersion \(\sigma =0.59\) for extinctions up to 1 mag, after which we switch to a power-law extension. The power-law is modeled as \(10^{-\mathrm{ln}(A_V/\bar{A_V})}\) as determined qualitatively from the maps of [34]. When applying this distribution to our dataset, we assume an average visual extinction of 1.2 [34].

Secondly, each type of object receives different extinction depending on its likelihood to be in front of or behind the cloud. For galaxies, they are assumed to be distributed uniformly behind the cloud. For field stars, these are assumed to have a bimodal distribution, either being in front of (40%), or behind (60%) the cloud. Since the brown dwarf and white dwarf model SEDs are faint, we anticipate that they will only be visible in front of the cloud, hence we do not apply any extinction to them. The YSOs are assumed to be within the cloud (with 10-90% of extinction at their location being applied), and that they are only found in parts of the cloud with \(A_V>3\) mag, i.e. they are embedded within some measure of cloud. We recognize that this will place some more-evolved YSOs in higher column density regions than one might anticipate if they have migrated from their birth sites, or if the part of the cloud from which they were born has begun to disperse [35]. Similarly, this assumption will place some less-evolved YSOs in lower column density regions than would be expected, since they are typically found above \(A_V=5-7\) [36]. The extinction model used here is chosen as the optimal for model performance, as well as being realistic. Doubling or halving the minimum \(A_V\) leads to slightly worse performance in classifying these objects.

Finally, we use the extinction values from Table 5 of [37] to compute the loss of flux expected in each filter by the given \(A_V\) value to JWST NIRCam, 2MASS, and Spitzer IRAC filters. This reference does not provide similar conversions for longer wavelength IR bands, thus requiring us to take a different approach. For JWST MIRI and Spitzer MIPS filters, we use the OH5 model from [38] to approximate the extinction, interpolating to account for gaps in the table. We use the standard assumption that \(R_V = 3.1\) and hence \(A_V/A_K = 8.8\) [39] to convert from absorption coefficients to the appropriate extinction in each filter.

2.6.2 PAH models↩︎

Polycyclic Aromatic Hydrocarbon (PAH) emission is ubiquitous throughout the universe and particularly in molecular clouds where it is associated strongly with massive YSOs that illuminate the surrounding gas. For our purposes, we are most concerned with the ways in which PAH signatures can contaminate other sources to provide excess emission. Indeed, aperture contamination by PAHs is a well-known phenomenon, and is taken into account in traditional methods of color cut separation [4].

Regardless of the many unknowns associated with PAH emission, the progenitors of this emission have been broken down into four main species: PAH\(^+\)s (cationic PAHs), PAH\(^0\)s (neutral PAHs), PAH\(^x\)s (large – \(\sim 100\) C atoms – ionized PAHs), and eVSGs (evaporating very small grains). Observed PAH emission is further categorized by different “classes,” with each class having different levels of contribution from the four basic species. Class A and B are the most common, with Class A emission indicative of interstellar material illuminated by a star, be these HII regions, reflection nebulae, or the general ISM, while Class B is indicative of circumstellar material around Herbig Ae and Be stars, post-AGB stars, and planetary nebulae [40]. A third class, Class C, is much rarer and has been detected only in association with a few extreme post-AGB stars [40], [41].

PAH emission is most clearly measured around the wavelengths of 3.30, 6.20, 7.70, 8.60, 11.30, and 12.70 \(\mu m\) [41]. The JWST bands thus most affected by PAH emission are: F335M, F770W, F1130W and F1280W. To account for this emission, we utilize the template models from [42], along with the associated 3.3 \(\mu m\) emission inferred from [43]. These models include the four different species of PAHs, from which we compose Class A PAH emission spectra, since this class is the most spatially wide-spread within star forming regions, and thus is most likely to occur between a background source and the viewer. Class A PAH emission is composed primarily of the PAH\(^0\) and PAH\(^+\) species [40], with minor contributions from the PAH\(^x\) and eVSG species. As a basis for the relative flux contributions of each species, we refer to the distributions seen across real objects, as measured in Table 6 of [43]. Similarly, to determine what an appropriate level of flux would be for a given source’s SED, we use the ratios in Table 6 of [43] as a guide.

Not every sight-line in a molecular cloud has PAH emission, and it is important to have representative samples of both cases with and without PAH emission. As such, PAH emission is added to 50% of all objects experiencing an extinction greater than \(A_V=0.5\), to get a solid sample of all types of objects that may experience such contamination within our training set. Since the PAH emission originates within the molecular cloud, it may also be subject to some of the dust extinction present along the same line of sight. To address this effect, we apply the extinction and PAH emission as three steps. We take the extinction assigned to the source (see previous section), and choose some random value between 0 and 1. This fraction is then multiplied by the extinction to determine how much material is between the source and the PAH emission source. Then, the PAH emission is added. Finally, the remaining extinction is added to the combined signal of source object and PAH emission.

2.6.3 Noise↩︎

Along with the above effects, we add noise to the flux information in each band. Noise is defined at runtime based on the input real dataset being classified, where the noise is sampled from the distribution of errors of these data. For each filter, these errors are used to define the standard deviation of a Gaussian with mean 0, from which a value is randomly pulled and then added to the synthetic data to generate a noisier set of magnitudes and thus colors. In the specific case of where no real data have been supplied and the filter choice is being evaluated (see Section 4), we leave it to the user to decide the best method of defining errors and thus noise. For example, one simple approach would be to use Gaussian errors with a mean of 0.1 mag and standard deviation of 0.02 mag.

3 Methods↩︎

3.1 XGBoost↩︎

SESHAT is built on top of a machine learning framework. We use machine learning to be able to classify any set of JWST data, regardless of the number of bands present. In the previous section, we defined the various SED models, which by their synthetic nature allow us to use the same set of data for all possible filter combinations, and thus to create a model with a consistent basis for the determination of class.

In this work, we use XGBoost [44], a decision tree-based machine learning method for multi-class classification. This method is chosen in part for its similarity to color cuts (as it makes splits at each node of the tree). XGBoost was also chosen for its ability to avoid overfitting through the use of parameters and early stopping, i.e., when a model stops being trained when it begins to perform worse on a dataset used to validate the model. Finally, XGBoost natively handles missing values without the need for imputation.

Missing values are common in photometric datasets, either due to incomplete coverage, or saturation / obscuration of the source. The methods of imputation thus far available cannot take into account the possible variations, unless one has flags on the data to indicate what appropriate values may be. XGBoost dynamically determines optimal branch directions for missing values during training. In the case where there are no null values in the training data, XGBoost always chooses the right-side branch when it must make a decision based on missing data. It is thus important to include missing values in the training so the model can learn the optimal branch direction.

We add nulls to the dataset in three separate ways. First, we account for sources where some data are below the limiting magnitude of the observation by taking a subset of the training set, and setting the brightest filter to be 2 mags brighter than the limiting magnitude of that filter, and scale the rest of the data appropriately to match. Then, for each filter we determine if it is above or below the limiting magnitude of that filter. Similarly, we take another subset of data and set the dimmest magnitude to be 2 mags dimmer than the saturation limit, rescale, and set all filters with magnitude brighter than the saturation limits to null. Finally, we set random filters to null throughout the dataset to account for when data are not imaged or are otherwise lost to artifacts in the data.

3.1.1 Training, Validation, and Test Sets↩︎

With machine learning methods, there are three different types of datasets: training, validation, and test sets. For our training set, we use 75% of the synthetic models previously derived. We then over-sample (resample the same dataset) to achieve an even 10 000 objects in each class (YSO, field star, brown dwarf, white dwarf, and galaxy), with varying levels of noise. These are the data upon which the model is trained. The validation set is used to determine when to stop the training, and is necessary to prevent over-fitting. When possible, it is best to use a real dataset for this step. Since we are creating a method for application to any suite of filters and thus real classifications are not readily available, we validate using 50% of the remaining synthetic data not used in the training set, i.e., 12.5% of the total synthetic data. Finally, the test set is a set of data with no influence on the machine learning method, and is a check to see how well the model generalizes to new data. We use the remaining 12.5% of the synthetic data as our test set. In the following subsections we verify these methods on real data as an additional check.

The input parameters for our model are listed in Table [tab:xgb-pars]. These parameters are chosen to minimize over-fitting to synthetic data, based on a comparison to the SESNA catalogs that were used to constrain the YSO synthetic population. We note, however, that these parameters are set before run-time, and thus are applied to all datasets.

cc \(\eta\) & 0.01
\(\gamma\) & 15
subsample & 0.3
max_depth & 1
num_class & 5
objective & multi:softprob
eval_metric & mlogloss

3.2 Generic Synthetic Test↩︎

The success of classification with the synthetic test set is shown in Figure 6. Overall, we perform to a minimum of 85% recall across all objects, when using the Spitzer IRAC and MIPS 1 bands, along with the 2MASS J, H, and Ks bands, where recall is a measure of how many objects are correctly classified vs. how many objects are of that class. Ideally, one would want to see values approaching 1 along the diagonal, indicating accurate classifications. The values in the off-diagonal boxes describe what percent of the objects of that true label have been mislabeled as the predicted label. The exact performance will vary depending on the input filters, and so Fig. 6 acts as an example of possible performance. For any application, the test set acts as an approximate bound on the expected contamination rates. For example, in this case we can expect that 10% (\(752/7176\) from the number of objects in the test set) of YSOs will be mislabeled as galaxies, and approximately 5% of the galaxies will be labeled as YSOs.

Figure 6: Confusion matrix for the test set of synthetic data, using Spitzer and 2MASS filters. In each case, we include the fraction of objects of the true label in each bin, as well as the total number. Grayscale is given by the fraction.

3.3 Performance with Major Contaminators↩︎

Color cuts for YSOs are most often plagued by contamination from AGB stars [45], and unresolved star-forming galaxies [3], [4]. AGB stars can be easily confused for YSOs, as they often have a double-peaked SED due to being surrounded by shells of circumstellar material, while AGN and unresolved star-forming galaxies also can be obscured by dust. We hence create a new synthetic catalog of only YSOs, AGB stars, and galaxies, so we may have a statistically large sample to test SESHAT on these most troublesome classes. Similar to before, we use the 2MASS and Spitzer channels for this test. The results of this test are shown in Fig. 7, where all classes are recovered to greater than 85%.

Figure 7: Confusion matrix of the test set where we sought only to identify YSOs, AGB stars, and galaxies, using Spitzer and 2MASS filters.

3.4 Performance in Star-Forming Regions Observed with Spitzer↩︎

Star-forming regions are some of the most difficult regions in which to classify objects, due to extinction, PAH emission, foreground and background stars, background galaxies, and, of course, YSOs. YSOs have a wide variety of spectral types, with the peak wavelength of their distributions varying rapidly with age. A YSO is generally characterized as a point source with a circumstellar dust component, be this an envelope or disk. The signatures of YSOs can be easily confused with other objects, however, as described with the previous test. In this section, we test SESHAT on Spitzer data from the Spitzer Extended Solar Neighborhood Archive (SESNA; R.A. Gutermuth et al., in prep) of star-forming regions.

The SESNA catalogs are a rigorously tested and well-defined set of catalogs of several star-forming regions, varying in both distance from the Sun and environment (e.g., those highly irradiated by massive stars or quiescently collapsing). These data have been homogeneously analyzed to identify the following types of objects: YSOs (in their various classes of evolution), galaxies (both unresolved star-forming and AGN), field stars, and knots of shocked emission. SESNA thus allows us to test SESHAT on several different regions with homogeneously classified catalogs. In the following demonstration, we test on those objects with data in all bands.

We used fits to the three clouds nearest to the Sun (Ophiuchus, Taurus, and Corona Australis) plus the Orion A+B clouds, in the SESNA catalog to aid in the building of our synthetic sample of YSOs (see previous section). As such, we exclude objects in those regions from our test set. As well, the Pipe Nebula contain several artifacts leading to less dependable categorization of YSOs and suffer from high contamination rates due to its location within the Galactic plane. Thus, we also exclude its objects from our test set. Similarly, the data for Cygnus X were processed differently in comparison to the rest of the sample. Cygnus X is located at a much greater distance than the rest of the clouds, and has a very massive spatial extent. The distance measurements of this region may then be uncertain. Furthermore, Cygnus X has a higher rate of field stars (due to its distance and spatial extent), a higher rate of patches of bright nebulosity, and is shallower in the IRAC bands. UK Infrared Deep Sky Survey Galactic Plane Survey [46] rather than 2MASS data were used for Cygnus X as well. Due to the variety of these factors, we elected to also exclude Cygnus X from our test set.

The test set is thus composed of objects from 23 different star-forming regions (see Table [tab:app-sesna-deets] for a full list of regions). The results of our test set are shown in Figure 8. Overall, we retrieve greater than 85% recall across YSOs, galaxies, and field stars, with inconsistent classifications discussed in detail in the next section. In short, some fraction of the SESHAT “misclassifications” are actually correctly identifying expected contaminants in the original catalog. Similarly, there are objects that we misclassify that the color cuts from SESNA appropriately classify. Appendix 6 shows the confusion matrices for each individual region, including the regions excluded from our main analysis.

Figure 8: The confusion matrix for the test set portion of the SESNA catalog.

We note that neither the SESNA classification method nor SESHAT use spatial information. Exploring the spatial distribution of the differently classified objects could act as an independent check for the validity of classification. We recommend that the reader perform similar sanity checks of any obviously strange classifications when applying SESHAT to their own data.

3.5 Performance On Cosmological Fields↩︎

3.5.1 Analyzing YSO Contamination in Spitzer Boötes Data↩︎

Along with the aforementioned catalogs of star-forming clouds, the Spitzer Extended Solar Neighborhood Archive (SESNA; R.A. Gutermuth et al., in prep) data include a 16 square degree field of the Boötes cosmological field. The SESNA archive cataloged these data the same as for the star-forming regions, with classes of Field Star, Galaxy, or YSO, in order to determine the contamination rate of their YSO selection. This came out to be \(9\pm1\) per square degree [3], [4], [47]. We re-analyze the same data with SESHAT to determine our own contamination rate of the results described in the previous section. We note, again, that the performance will vary dependent on both the filters present and the classes chosen, and thus this is an example of its performance, with the filters those of J, H, and Ks from 2MASS and IRAC 1-4 and MIPS 1 from Spitzer. We again show the results when data in all filters are present in Figure 9.

Figure 9: Confusion matrix for the Boötes field from SESNA.

With these data, we are able to recover the field stars and galaxy classes to the same degree as SESNA. We find that there is a trade-off in the objects classified as YSOs between the two methods. In either case, there should be no YSOs, and yet some misclassification persists, at similar levels to what can be achieved through colour cuts.

3.5.2 Searching for Brown Dwarfs in JWST COSMOS Data↩︎

Along with application to Galactic regions, SESHAT can also be used for the identification of brown dwarfs in cosmological fields. We review the data of the JWST COSMOS field [48], [49], which have been previously searched for brown dwarfs using color cut criteria [18]. We compare our results to [18] as further validation of these methods, Figure 10.

Figure 10: The confusion matrix for the COSMOSWeb field, based on the classifications of brown dwarfs in [18].

The COSMOSWeb data were taken in only five JWST bands: F115W, F150W, F277W, F444W, and F770W, with the MIRI F770W band having significantly less coverage than the NIRCam bands. The COSMOSWeb catalog [49] includes flags for the quality of the photometric detection. We use all sources that are labeled as secure, as well as those labeled inconsistent between ground and space observatories, according to their scheme. We also set to NaN all measurements whose error on their flux exceeds 10% of the flux itself. These choices were made to eliminate all sources that were likely to be hot pixels, artifacts, or only detected in one band. We further removed all sources where the major axis was more than 10% larger than the minor axis, to remove objects that are unambiguously galaxies, by shape alone.

[18] used several cuts in color space as well as the ellipticity of each point source for the determination of their brown dwarf candidates. We run SESHAT on only the colors and classify sources as one of brown dwarfs, white dwarfs, field stars, or galaxies. Doing this, we retrieve 100% of the previously identified brown dwarf candidates. The remaining data (previously unlabeled) are split between field stars (22 038 or 24%), white dwarfs (1 or \(\sim0\%\)), galaxies (69 885 or 76%) and many additional brown dwarfs (1663 or 2%),. To further narrow the selection of brown dwarfs, it is useful to include size and shape information. SESHAT, however, provides a first pass that can identify a much smaller population for further follow up as brown dwarfs.

4 SESHAT Implementation and Use↩︎

We have produced SESHAT: Stellar Evolutionary Stage Heuristic Assessment Tool; a tool for the classification of JWST, Spitzer, or 2MASS photometric data, and have made it available to the community as a Python package (https://pypi.org/project/seshat-classifier/). This tool has two intended uses. The first, is to provide probabilities and classifications for any JWST dataset. The second, is to aid in the writing of JWST proposals by evaluating a synthetic test set on a given suite of filters.

To classify a set of objects, only two pieces of information are necessary: the catalog to be classified, and the classes to be searched for. These can be “YSO” for YSOs, “FS” for field stars, “BD” for brown dwarfs, “WD” for white dwarfs, or “Gal” for galaxies. The catalog must contain the Vega magnitudes and errors for each source. From this input, the color information will be extracted, the model will be updated based on the uncertainties, limiting magnitudes, and saturation magnitudes of this dataset. A copy of the catalog with the probability of being classified as each object, along with the final class (based on whatever class has the maximum probability for that object), will be returned. A copy of the true and predicted classes and probabilities for the synthetic test set will also optionally be returned to aid in determining contamination rates.

To test JWST proposals, one must supply the filters, the limiting and saturating magnitudes of the proposed observation, and an appropriate distribution of uncertainties for each filter. From this information, a synthetic test set will be built and the true class, predicted class, and probabilities for each class will be returned.

4.1 Limitations↩︎

The classification accuracies presented for the SESNA tests required having data in at least 6 filters including MIPS1. The MIPS 1 data were necessary to limit the degeneracy across models and thus constrain the models. Due to their steeper SED slope in the MIR, Class II YSOs are more likely to be undetected in MIPS bands in SESNA. As such, this means that SESHAT is more sensitive to Class I YSOs compared to the Class II YSOs that are missing data in the MIPS filter. Future updates of the tool, anticipated after the Cycle 5 JWST deadline, will include an improved treatment of missing MIPS data.

4.2 Probabilities↩︎

Regardless of the dataset, a set of probabilities for each source is returned. The benefit of using probabilities rather than strict classifications lies in their flexibility. For instance, in a cosmological field, one may wish to set a higher threshold for the classification of an object as anything other than a galaxy. Or, one may decide to apply Bayesian inference to restructure the probabilities based on prior knowledge of the region or object. Or, if the difference between the probabilities of the top two or more objects is less than a certain threshold, an “ambiguous” classification can be applied. We encourage the reader to perform any modifications they deem fit to these probabilities, to best benefit their science case.

5 Conclusions↩︎

In this work, we have defined SESHAT, a Python tool that is made available to the community through the package seshat-classifier, available on PyPI. This tool is capable of classifying a range of evolutionary stages of stars, from YSO through to white dwarfs, and galaxies. In the era of JWST, where every observation has the potential to use its own unique set of filters, having a method that can be applied regardless of the exact filters available can be a great aid in both expediting classifications and allowing consistent classifications across datasets.

The key uses of SESHAT are as follows:

It allows the user to process a catalog of any JWST, Spitzer, or 2MASS data, to obtain probabilities for each object to be one of YSOs, Field Stars, Brown Dwarfs, White Dwarfs, or Galaxies.
A copy of the test set is also optionally returned, so the purity of the classifications can be estimated.
JWST proposals can be tested by specifying a suite of filters, whereby the test set classifications are returned.

We note to the reader the caveat that the classifications provided by SESHAT have a maximum accuracy given by the test set. The probabilities may be the more useful output, as they can be combined with Bayesian inference to obtain stronger classifications, depending on the region. It is also important to note that classical color cut methods also suffer from lack of purity, as the overlap of objects in color-color space can sometimes simply not be distinguished regardless of the cut applied. Finally, the performance of both classical methods as well as SESHAT will vary with what input bands are available, due to the measuring, or lack thereof, of important features. SESHAT has been tested against Galactic star-forming regions and cosmological surveys to assess its real-world performance. We quantify this as:

When testing against synthetic data of the Spitzer/2MASS bands, recall above 90% is achieved across all classes.
When testing against real Spitzer/2MASS data of star-forming regions, recall above 80% is achieved across all classes.
When testing against a cosmological set for the identification of brown dwarfs, 100% of the previously identified brown dwarfs were recovered, and 93% of the galaxies were labeled as such. No field star class was provided for this dataset.

We expect that SESHAT will be useful to a large portion of the community, especially as more JWST data become available.

The authors would like to thank Drs. Simon Blouin, Hossen Teimoorinia, and Chris Willott for their helpful discussions. BLC acknowledges the support of an NSERC Doctoral Award held at the University of Victoria. HK acknowledges support from an NSERC Discovery Grant. This work was carried out at the University of Victoria, on the unceded lands of the Songhees and Esquimalt peoples, whom we respectfully acknowledge. We acknowledge as well the resources of the NASA ADS service.

6 SESNA Region Classifications↩︎

In this Appendix, we include the confusion matrices with comparisons to the spatial distribution of objects for each region in SESNA. When possible we include for reference extinction maps pulled from [50] as a guide to the underlying cloud structure. Table [tab:app-sesna-deets] lists important details for each region, including the rate of galaxies statistically likely to be contaminating the YSO selection of SESNA, and whether the set was used to define the training set. We calculate the expected contamination rate as \(9\pm1 \times A\), where \(A\) is the area (in degrees) of the SESNA map of the region.

llccc AFGL 490 & \(\sim900\) [51] & \(6.8\pm0.8\) & F & Fig. 11
Aquila & \(\sim436\) [52] & \(188\pm21\)& F & Fig. 12
Auriga-California & \(\sim450\) [53] & \(44\pm5\) & F & Fig. 13
Cepheus Flare & \(\sim350\) [54] & \(42\pm5\) & F & Fig. 14
Cepheus OB3 & \(\sim800\) [55] & \(72\pm8\) & F & Fig. 15
Chamaeleon & \(\sim 190\) [12] & \(87\pm10\) & F & Fig. 16
Corona Australis & \(\sim155\) [12] & \(18\pm2\) & T &Fig. 17
Cygnus X & \(\sim 760-1660\) [12] & \(368\pm41\)& F & Fig. 18
GGD4, CB34 & \(\sim1370\) [12]& \(2.8\pm 0.3\) & F & Fig. 19
IC 5146 & \(\sim750\) [12] & \(12\pm1\) & F & Fig. 20
IRAS 20050+2720 & \(\sim700\) [56] & \(3.5\pm0.4\) & F &Fig. 21
L988 & \(\sim620\) [12] & \(2.1\pm0.2\)& F & Fig.22
Lupus & \(\sim160\) [12] & \(124\pm14\) & F & Fig. 23
Mon OB1 & \(\sim750\) [12] & \(28\pm3\) & F & Fig. 24
Mon R2 & \(\sim850\) [12] & \(78\pm9\)& F & Fig. 25
Musca & \(\sim140\) [57] & \(38\pm4\)& F & Fig. 26
NGC 7129 & \(\sim1150\) [58] & \(1.5\pm0.2\) & F & Fig. 27
North America Nebula & \(\sim800\) [12] & \(69\pm8\) & F & Fig. 28
Ophiuchus & \(\sim130\) [12] & \(164\pm18\) & T & Fig. 29
Orion A & \(\sim420\) [12] & \(205\pm23\)& T & Fig. 30
Orion B & \(\sim420\) [12] & \(36\pm4\)& T & Fig. 31
Perseus & \(\sim280\) [12] & \(155\pm17\)& F & Fig. 32
Pipe & \(\sim180\) [12] & \(125\pm14\) & F & Fig. 33
S131 & \(\sim925\) [59]& \(1.5\pm0.2\) & F & Fig. 34
S140 & \(\sim764\) [60] & \(17\pm2\)& F & Fig. 35
S171 & \(\sim1000\) [61] & \(0.9\pm0.1\)& F & Fig. 36
Scorpius & \(\sim130\) (Taken to match Ophiuchus) & \(34\pm4\) & F & Fig. 37
Taurus & \(\sim140\) [12] & \(12\pm1\) & T & Fig. 38
Vela D & \(\sim700\) [62] & \(20\pm2\)& F & Fig. 39

Figure 11: *Left*: Confusion matrix for AFGL 490 (part of test set). Expected contamination rate for this region is \(6.8\pm0.8\) galaxies mis-labeled as YSOs by SESNA. *Right*: Grayscale IRAC 4 image with the YSOs identified in this work and SESNA labeled with gold stars, the YSOs only identified by SESHAT labeled with red circles, and the YSOs only identified in SESNA labeled with red squares. Contours from the extinction maps of [50]: \(A_V=1\) mag in white, \(A_V=2\) mag in light gray, \(A_V = 5\) mag in dark gray, and \(A_V=10\) mag in black..

Figure 12: The same as Figure 11, but now for the Aquila star-forming region, which is part of the test set. Expected contamination rate is \(188\pm21\) SESNA YSOs are actually galaxies..

Figure 13: The same as Figure 11, but now for the Auriga-California star-forming region, which is part of the test set. Expected contamination rate is \(44\pm5\) SESNA YSOs are actually galaxies..

Figure 14: The same as Figure 11, but now for the Cepheus Flare region, which is part of the test set. Expected contamination rate is \(42\pm5\) SESNA YSOs are actually galaxies..

Figure 15: The same as Figure 11, but now for the Cepheus OB3 region, which is part of the test set. Expected contamination rate is \(72\pm8\) SESNA YSOs are actually galaxies..

Figure 16: The same as Figure 11, but now for the Chamaeleon region, which is part of the test set. Expected contamination rate is \(87\pm10\) SESNA YSOs are actually galaxies..

Figure 17: The same as Figure 11, but now for the Corona Australis region, which is part of set used to define training YSOs. Expected contamination rate is \(87\pm10\) SESNA YSOs are actually galaxies..

Figure 18: The same as Figure 11, but now for the Cygnus X region, which is excluded from both test and training sets. Expected contamination rate is \(368\pm41\) SESNA YSOs are actually galaxies..

Figure 19: The same as Figure 11, but now for the GGD4 CB34 region, which is part of the test set. Expected contamination rate is \(2.8\pm0.3\) SESNA YSOs are actually galaxies..

Figure 20: The same as Figure 11, but now for the IC 5146 region, which is part of the test set. Expected contamination rate is \(12\pm1\) SESNA YSOs are actually galaxies..

Figure 21: The same as Figure 11, but now for the IRAS 20050+2720 region, which is part of the test set. Expected contamination rate is \(3.5\pm0.4\) SESNA YSOs are actually galaxies..

Figure 22: The same as Figure 11, but now for the L 988 region, which is part of the test set. Expected contamination rate is \(2.1\pm0.2\) SESNA YSOs are actually galaxies..

Figure 23: The same as Figure 11, but now for the Lupus region, which is part of the test set. Expected contamination rate is \(124\pm14\) SESNA YSOs are actually galaxies..

Figure 24: The same as Figure 11, but now for the Mon OB1 region, which is part of the test set. Expected contamination rate is \(28\pm3\) SESNA YSOs are actually galaxies..

Figure 25: The same as Figure 11, but now for the Mon R2 region, which is part of the test set. Expected contamination rate is \(78\pm9\) SESNA YSOs are actually galaxies..

Figure 26: The same as Figure 11, but now for the Musca region, which is part of the test set. Expected contamination rate is \(38\pm4\) SESNA YSOs are actually galaxies..

Figure 27: The same as Figure 11, but now for the NGC 7129 region, which is part of the test set. Expected contamination rate is \(1.5\pm0.2\) SESNA YSOs are actually galaxies..

Figure 28: The same as Figure 11, but now for the North America Nebula, which is part of the test set. Expected contamination rate is \(69\pm8\) SESNA YSOs are actually galaxies..

Figure 29: The same as Figure 11, but now for Ophiuchus, which is part of set used to define training YSOs. Expected contamination rate is \(164\pm18\) SESNA YSOs are actually galaxies..

Figure 30: The same as Figure 11, but now for Orion A, which is part of set used to define training YSOs. Expected contamination rate is \(205\pm23\) SESNA YSOs are actually galaxies..

Figure 31: The same as Figure 11, but now for Orion B, which is part of set used to define training YSOs. Expected contamination rate is \(36\pm4\) SESNA YSOs are actually galaxies..

Figure 32: The same as Figure 11, but now for the Perseus region, which is part of the test set. Expected contamination rate is \(155\pm17\) SESNA YSOs are actually galaxies..

Figure 33: The same as Figure 11, but now for the Pipe region, which is excluded from both test and training sets. Expected contamination rate is \(125\pm14\) SESNA YSOs are actually galaxies. NOTE: there is no estimate for the contamination rate by field stars, which, though generally easy to distinguish from YSOs, becomes significant when looking directly into the Galactic plane, as is done here..

Figure 34: The same as Figure 11, but now for the S 131 region, which is part of the test set. Expected contamination rate is \(1.5\pm0.2\) SESNA YSOs are actually galaxies..

Figure 35: The same as Figure 11, but now for the S 140 region, which is part of the test set. Expected contamination rate is \(17\pm2\) SESNA YSOs are actually galaxies..

Figure 36: The same as Figure 11, but now for the S 171 region, which is part of the test set. Expected contamination rate is \(0.9\pm0.1\) SESNA YSOs are actually galaxies..

Figure 37: The same as Figure 11, but now for the Scorpius region, which is part of the test set. Expected contamination rate is \(34\pm4\) SESNA YSOs are actually galaxies..

Figure 38: The same as Figure 11, but now for Taurus, which is part of the set used to define training YSOs. Expected contamination rate is \(12\pm1\) SESNA YSOs are actually galaxies..

Figure 39: The same as Figure 11, but now for the Vela D region, which is part of the test set. Expected contamination rate is \(20\pm2\) SESNA YSOs are actually galaxies..

References↩︎

[1]

Pontoppidan, K. M., Barrientes, J., Blome, C., et al. 2022, The JWST Early Release Observations,, 936, L14,.

[2]

Green, J. D., Pontoppidan, K. M., Reiter, M., et al. 2024, Why Are (Almost) All the Protostellar Outflows Aligned in Serpens Main?,, 972, 5,.

[3]

Gutermuth, R. A., Myers, P. C., Megeath, S. T., et al. 2008, Spitzer Observations of NGC 1333: A Study of Structure and Evolution in a Nearby Embedded Cluster,, 674, 336,.

[4]

Gutermuth, R. A., Megeath, S. T., Myers, P. C., et al. 2009, A Spitzer Survey of Young Stellar Clusters Within One Kiloparsec of the Sun: Cluster Core Extraction and Basic Structural Analysis,, 184, 18,.

[5]

Cornu, D., &Montillaud, J. 2021, A neural network-based methodology to select young stellar object candidates from IR surveys,, 647, A116,.

[6]

Kuhn, M. A., de Souza, R. S., Krone-Martins, A., et al. 2021, SPICY: The Spitzer/IRAC Candidate YSO Catalog for the Inner Galactic Midplane,, 254, 33,.

[7]

Marton, G., Tóth, L. V., Paladini, R., et al. 2016, An all-sky support vector machine selection of WISE YSO candidates,, 458, 3479,.

[8]

Robitaille, T. P., Whitney, B. A., Indebetouw, R., &Wood, K. 2007, Interpreting Spectral Energy Distributions from Young Stellar Objects. II. Fitting Observed SEDs Using a Large Grid of Precomputed Models,, 169, 328,.

[9]

Richardson, T., Ginsburg, A., Indebetouw, R., &Robitaille, T. P. 2024, An Updated Modular Set of Synthetic Spectral Energy Distributions for Young Stellar Objects,, 961, 188,.

[10]

Robitaille, T. P., Whitney, B. A., Indebetouw, R., Wood, K., &Denzmore, P. 2006, Interpreting Spectral Energy Distributions from Young Stellar Objects. I. A Grid of 200,000 YSO Model SEDs,, 167, 256,.

[11]

Robitaille, T. P. 2017, A modular set of synthetic spectral energy distributions for young stellar objects,, 600, A11,.

[12]

Zucker, C., Speagle, J. S., Schlafly, E. F., et al. 2020, A compendium of distances to molecular clouds in the Star Formation Handbook,, 633, A51,.

[13]

Castelli, F., &Kurucz, R. L. 2003, in Modelling of Stellar Atmospheres, ed. N. Piskunov, W. W. Weiss, & D. F. Gray, Vol. 210, A20,.

[14]

Bressan, A., Marigo, P., Girardi, L., et al. 2012, PARSEC: stellar tracks and isochrones with the PAdova and TRieste Stellar Evolution Code,, 427, 127,.

[15]

Kroupa, P. 2001, On the variation of the initial mass function,, 322, 231,.

[16]

Phillips, M. W., Tremblin, P., Baraffe, I., et al. 2020, A new set of atmosphere and evolution models for cool T-Y brown dwarfs and giant exoplanets,, 637, A38,.

[17]

Whitworth, A. 2018, Brown Dwarf Formation: Theory, arXiv e-prints, arXiv:1811.06833,.

[18]

Chen, A. Y. A., Goto, T., Wu, C. K. W., et al. 2025, Brown dwarf number density in the JWST COSMOS-Web field,, 42, e042,.

[19]

Blouin, S., Dufour, P., &Allard, N. F. 2018, A New Generation of Cool White Dwarf Atmosphere Models. I. Theoretical Framework and Applications to DZ Stars,, 863, 184,.

[20]

Tremblay, P.-E., Bédard, A., O’Brien, M. W., et al. 2024, The Gaia white dwarf revolution,, 99, 101705,.

[21]

Crompvoets, B. L., Di Francesco, J., Teimoorinia, H., &Preibisch, T. 2024, Climbing the Cliffs: Classifying Young Stellar Objects in the Cosmic Cliffs JWST Data Using a Probabilistic Random Forest,, 168, 63,.

[22]

Kilic, M., von Hippel, T., Mullally, F., et al. 2006, The Mystery Deepens: Spitzer Observations of Cool White Dwarfs,, 642, 1051,.

[23]

Axelrod, T., Saha, A., Matheson, T., et al. 2023, All-sky Faint DA White Dwarf Spectrophotometric Standards for Astrophysical Observatories: The Complete Sample,, 951, 78,.

[24]

Limbach, M. A., Vanderburg, A., Stevenson, K. B., et al. 2022, A new method for finding nearby white dwarfs exoplanets and detecting biosignatures,, 517, 2622,.

[25]

Blouin, S. 2024, White dwarf fundamentals, arXiv e-prints, arXiv:2409.03941,.

[26]

Burgarella, D., Buat, V., &Iglesias-Páramo, J. 2005, Star formation and dust attenuation properties in galaxies from a statistical ultraviolet-to-far-infrared analysis,, 360, 1413,.

[27]

Noll, S., Burgarella, D., Giovannoli, E., et al. 2009, Analysis of galaxy spectral energy distributions from far-UV to far-IR with CIGALE: studying a SINGS test sample,, 507, 1793,.

[28]

Boquien, M., Burgarella, D., Roehlly, Y., et al. 2019, CIGALE: a python Code Investigating GALaxy Emission,, 622, A103,.

[29]

Burgarella, D., Nanni, A., Hirashita, H., et al. 2020, Observational and theoretical constraints on the formation and early evolution of the first dust grains in galaxies at 5 < z < 10,, 637, A32,.

[30]

Wen, R., An, F., Zheng, X. Z., et al. 2022, The Physical Properties of Star-forming Galaxies with Strong [O III] Lines at z = 3.25,, 933, 50,.

[31]

Rieke, G., Alberts, S., Shivaei, I., et al. 2024, The SMILES Mid-Infrared Survey, arXiv e-prints, arXiv:2406.03518,.

[32]

Alberts, S., Lyu, J., Shivaei, I., et al. 2024, SMILES Initial Data Release: Unveiling the Obscured Universe with MIRI Multi-band Imaging, arXiv e-prints, arXiv:2405.15972,.

[33]

Burkhart, B. 2018, The Star Formation Rate in the Gravoturbulent Interstellar Medium,, 863, 118,.

[34]

Kainulainen, J., Beuther, H., Henning, T., &Plume, R. 2009, Probing the evolution of molecular cloud structure. From quiescence to birth,, 508, L35,.

[35]

Gupta, A., &Chen, W.-P. 2022, Interplay between Young Stars and Molecular Clouds in the Ophiuchus Star-forming Complex,, 163, 233,.

[36]

Lada, C. J., Lombardi, M., &Alves, J. F. 2010, On the Star Formation Rates in Molecular Clouds,, 724, 687,.

[37]

Wang, S., &Chen, X. 2019, The Optical to Mid-infrared Extinction Law Based on the APOGEE, Gaia DR2, Pan-STARRS1, SDSS, APASS, 2MASS, and WISE Surveys,, 877, 116,.

[38]

Ossenkopf, V., &Henning, T. 1994, Dust opacities for protostellar cores.,, 291, 943.

[39]

Rieke, G. H., &Lebofsky, M. J. 1985, The interstellar extinction law from 1 to 13 microns.,, 288, 618,.

[40]

Peeters, E., Hony, S., Van Kerckhoven, C., et al. 2002, The rich 6 to 9 vec mu m spectrum of interstellar PAHs,, 390, 1089,.

[41]

Tielens, A. G. G. M. 2008, Interstellar polycyclic aromatic hydrocarbon molecules.,, 46, 289,.

[42]

Pilleri, P., Montillaud, J., Berné, O., &Joblin, C. 2012, Evaporating very small grains as tracers of the UV radiation field in photo-dissociation regions,, 542, A69,.

[43]

Foschino, S., Berné, O., &Joblin, C. 2019, Learning mid-IR emission spectra of polycyclic aromatic hydrocarbon populations from observations,, 632, A84,.

[44]

Chen, T., &Guestrin, C. 2016, XGBoost: A Scalable Tree Boosting System, arXiv e-prints, arXiv:1603.02754,.

[45]

Dunham, M. M., Allen, L. E., Evans, Neal J., I., et al. 2015, Young Stellar Objects in the Gould Belt,, 220, 11,.

[46]

Lucas, P. W., Hoare, M. G., Longmore, A., et al. 2008, The UKIDSS Galactic Plane Survey,, 391, 136,.

[47]

Pokhrel, R., Gutermuth, R. A., Betti, S. K., et al. 2020, Star-Gas Surface Density Correlations in 12 Nearby Molecular Clouds. I. Data Collection and Star-sampled Analysis,, 896, 60,.

[48]

Casey, C. M., Kartaltepe, J. S., Drakos, N. E., et al. 2023, COSMOS-Web: An Overview of the JWST Cosmic Origins Survey,, 954, 31,.

[49]

Shuntov, M., Akins, H. B., Paquereau, L., et al. 2025, COSMOS2025: The COSMOS-Web galaxy catalog of photometry, morphology, redshifts, and physical parameters from JWST, HST, and ground-based imaging, arXiv e-prints, arXiv:2506.03243,.

[50]

Dobashi, K., Uehara, H., Kandori, R., et al. 2005, Atlas and Catalog of Dark Clouds Based on Digitized Sky Survey I,, 57, S1,.

[51]

Snell, R. L., Scoville, N. Z., Sanders, D. B., &Erickson, N. R. 1984, High-velocity molecular jets.,, 284, 176,.

[52]

Ortiz-León, G. N., Loinard, L., Dzib, S. A., et al. 2018, Gaia-DR2 Confirms VLBA Parallaxes in Ophiuchus, Serpens, and Aquila,, 869, L33,.

[53]

Lada, C. J., Lombardi, M., &Alves, J. F. 2009, The California Molecular Cloud,, 703, 52,.

[54]

Szilágyi, M., Kun, M., &Ábrahám, P. 2021, The Gaia view of the Cepheus flare,, 505, 5164,.

[55]

Moreno-Corral, M. A., C., C.-K., de La Ra, E., &Wagner, S. 1993, H-alpha interferometric, optical and near IR photometric studies of star forming regions. I. The Cepheus B/Sh2-155/Cepheus OB3 association complex.,, 273, 619.

[56]

Wilking, B. A., Mundy, L. G., Blackwell, J. H., &Howe, J. E. 1989, A Millimeter-Wave Spectral Line and Continuum Survey of Cold IRAS Sources,, 345, 257,.

[57]

Bonne, L., Bontemps, S., Schneider, N., et al. 2020, Formation of the Musca filament: evidence for asymmetries in the accretion flow due to a cloud-cloud collision,, 644, A27,.

[58]

Straižys, V., Maskoliūnas, M., Boyle, R. P., et al. 2014, The distance to the young cluster NGC 7129 and its age,, 438, 1848,.

[59]

Pelayo-Baldárrago, M. E., Sicilia-Aguilar, A., Fang, M., et al. 2023, Star formation in IC1396: Kinematics and subcluster structure revealed by Gaia,, 669, A22,.

[60]

Hirota, T., Ando, K., Bushimata, T., et al. 2008, Astrometry of H\(_{2}\)O Masers in Nearby Star-Forming Regions with VERA III. IRAS 22198+6336 in Lynds1204G,, 60, 961,.

[61]

Pandey, A. K., Sharma, S., Ogura, K., et al. 2008, Stellar contents and star formation in the young star cluster Be 59,, 383, 1241,.

[62]

Liseau, R., Lorenzetti, D., Nisini, B., Spinoglio, L., &Moneti, A. 1992, Star formation in the VELA molecular clouds. I. The IRAS-bright class I sources.,, 265, 577.

https://pypi.org/project/seshat-classifier/↩︎

Object Classification from JWST Catalogs