Abstract

The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 15 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.

Keywords: brain MRI, public datasets, foundation models, data harmonization, preprocessing variability, covariate shift

Minh Sao Khue Luu\({}^{1,*}\), Margaret V. Benedichuk\({}^{1}\), Ekaterina I. Roppert\({}^{1}\), Roman M. Kenzhin\({}^{1}\), Bair N. Tuchinov\({}^{1}\)
\({}^1\) The Artificial Intelligence Research Center of Novosibirsk State University, 630090 Novosibirsk, Russia
\({}^*\)khue.luu@g.nsu.ru

1 Introduction↩︎

Brain diseases such as tumors, Alzheimer’s disease, multiple sclerosis, and stroke affect millions worldwide, leading to significant health and societal burdens [1]. Magnetic resonance imaging (MRI) has become the central modality for studying these conditions due to its non-invasive nature, superior soft tissue contrast, and ability to capture diverse anatomical and physiological information across multiple sequences. While originally developed for clinical decision-making, the rapid expansion of publicly available MRI datasets has transformed neuroimaging into a data-driven domain where large-scale machine learning, particularly foundation models, now plays a pivotal role.

Foundation models, first established in natural language processing and computer vision [2], are increasingly being explored for medical imaging [3]–[7]. Their promise lies in learning generalizable representations from heterogeneous data and transferring them to a wide range of downstream tasks. However, the success of such models in brain MRI crucially depends on the availability of harmonized, large-scale datasets. Unlike other imaging domains, brain MRI suffers from high heterogeneity: multiple acquisition protocols, diverse sequence types (e.g., T1w, T2w, FLAIR, DWI), inconsistent annotations, fragmented repositories, and variable licensing terms. This fragmentation presents a unique challenge for developing general-purpose models.

A number of surveys and benchmarks have attempted to catalog medical imaging datasets, but they fall short in key ways when viewed from the perspective of brain MRI foundation models. For instance, MedSegBench [8] aggregates 35 datasets across modalities, yet includes only one brain MRI dataset and provides no analysis of voxel-level heterogeneity or preprocessing standards. Similarly, Dishner et al. [9] catalogued 110 radiology datasets (49 brain MRI) but spanned too broad an anatomical scope to address brain-specific challenges such as sequence diversity and harmonization [10]. Other focused reviews, e.g., on glioma datasets [11], [12], provide valuable clinical and molecular context but rarely analyze imaging metadata (e.g., voxel resolution, intensity distributions, or missing modalities) that directly influence pretraining strategies. Even highly influential initiatives like the BraTS Challenge [13]–[16] have advanced reproducibility and benchmarking but rely on heavily preprocessed data, which reduces heterogeneity and thus limits real-world generalization. In short, prior surveys tend to be either too broad (spanning many anatomical domains) or too narrow (focusing on a single disease), and they often omit the image- and preprocessing-level variability most relevant for foundation model development.

This review addresses these gaps. We provide a structured and multi-level assessment of public brain MRI datasets with a specific focus on their suitability for foundation model training. Unlike prior works, we move beyond cataloguing and explicitly quantify variability across dataset-level and image-level properties. We also evaluate the effects of preprocessing choices, which remain a largely underexplored source of covariate shift. Our analysis is designed to bridge the disconnect between dataset curation and model pretraining, highlighting practical considerations for building harmonized resources.

Our contributions are fourfold:

Dataset-level review: We review 54 adult 3D structural brain MRI datasets covering over 538,031 subjects. This includes detailed analysis of modality composition, disease coverage, dataset scale, and licensing diversity, revealing major imbalances between healthy and clinical populations that influence pretraining data design.
Image-level profiling: We perform a quantitative comparison of voxel spacing, image orientation, and intensity statistics across 15 representative datasets. This analysis exposes strong variation in geometric resolution and contrast distribution, which can affect how foundation models learn anatomical and pathological features.
Quantitative evaluation of preprocessing variability: We measure how bias field correction, intensity normalization, skull stripping, registration, and interpolation modify voxel-level statistics and geometry across datasets.
Feature-space analysis of residual covariate shift: Using a 3D DenseNet121, we quantify cross-dataset divergence that remains after full preprocessing, linking voxel-level variability to learned representations.

Together, these contributions provide the first structured review that unifies dataset-, image-, and preprocessing-level analyses, offering practical guidelines for building harmonized and generalizable brain MRI foundation models.

2 Review Methodology↩︎

2.1 Data Collection and Selection Process↩︎

We performed a structured search for publicly available brain MRI datasets between May and June 2025. Sources included Google, Google Dataset Search, PubMed, Scientific Data, and major neuroimaging repositories such as TCIA, OpenNeuro, NITRC, CONP Portal and Synapse. Search terms combined phrases such as “public brain MRI dataset,” “open access brain MRI,” “3D structural brain MRI for AI,” and “MRI segmentation dataset,” with variations replacing “dataset” by “database.” No date restrictions were applied. Each repository entry or publication was manually reviewed to determine eligibility, and the process was repeated iteratively until no new datasets were identified, achieving data saturation.

This review focused exclusively on datasets containing 3D structural MRI of the adult human brain. Datasets were included only if they satisfied all of the following criteria:

volumetric 3D structural MRI scans were available (not 2D slices or statistical maps);
subjects were adults;
at least one structural modality (e.g., T1-weighted) was included, rather than only functional or diffusion modalities (e.g., fMRI, DTI, MRA);
acquisitions were 3D static volumes (not 4D dynamic or time-resolved scans); and
at least 20 unique 3D scans were provided.

For multimodal datasets that additionally included fMRI, DTI, PET, or clinical assessments, only the structural MRI scans were considered in this review.

2.1.0.1 Screening Outcome.

Our search yielded more than one hundred candidate entries across repositories and publications. After removing duplicates and excluding pediatric-only cohorts, 2D or statistical map datasets, collections with fewer than 20 scans, and datasets without accessible images, a total of 52 datasets were retained. Together, these cover 524,310 subjects and form the basis of our review.

2.1.0.2 Standardization of Modalities and Cohort Labels.

To enable consistent comparison across heterogeneous datasets, we standardized both imaging modalities and cohort labels. The detailed mapping rules are summarized in Appendix Table 7 (modalities) and Table 8 (cohorts).

These datasets span a broad range of neurological and psychiatric conditions alongside healthy controls, and vary in imaging protocols, scanner characteristics, and subject demographics. A complete overview is provided in Table 1.

Table 1: Summary of included brain MRI datasets.
Dataset	Modality	Cohort	#Subjects
Table – continued from previous page
Dataset	Modality	Cohort	#Subjects
Continued on next page
[18F]MK6240 [17]	T1, Others	Healthy	33
ABIDE-I [18]	T1	Autism, Healthy	1,112
ABIDE-II [19]	T1	Autism, Healthy	1,114
ADNI [20], [21]	T1, T2, FLAIR	Neurodegenerative	4,068
AOMIC-ID1000 [22], [23]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	928
AOMIC-PIOP1 [23], [24]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	216
AOMIC-PIOP2 [23], [25]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	226
ARC [26], [27]	T1, T2, FLAIR, DWI/DTI, fMRI/rs-fMRI	Stroke	230
ATLAS R2.0 [28]	T1	Stroke	955
BBSRC [29]	T1, DWI/DTI, fMRI	Healthy	34
BrainMetShare [30]	T1, T1C, FLAIR	Brain Tumor	156
BraTS-GLI (2025) [15], [31]	T1, T1C, T2, FLAIR	Brain Tumor	1,809
BraTS-MEN (2025) [32]–[34]	T1, T1C, T2, FLAIR	Brain Tumor	750
BraTS-MET (2025) [35]	T1, T1C, T2, FLAIR	Brain Tumor	1,778
BraTS-SSA (2025) [36], [37]	T1, T1C, T2, FLAIR	Brain Tumor	95
CC-359 [38]	T1	Healthy	359
DLBS [39]	T1, T2, FLAIR, DWI/DTI, fMRI/rs-fMRI	Healthy	464
EDEN2020 [40]	T1C, FLAIR, DWI/DTI, Others	Brain Tumor, Healthy	45
EPISURG [41]	T1	Epilepsy	430
GSP [42]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	1,570
HBN-SSI [43]	T1, Others	Healthy	13
HCP [44]	T1, fMRI/rs-fMRI	Healthy	1,200
ICTS [45]	T1C	Brain Tumor	1,591
IDB-MRXFDG [46]	T1, FLAIR, Others	Healthy	37
IDEAS [47]	T1, FLAIR	Epilepsy	442
ISLES22 [48]	T1, T2, DWI/DTI, FLAIR	Stroke	400
IXI [49]	T1, T2, DWI/DTI	Healthy	581
MBSR [50]	T1, DWI/DTI, fMRI	Healthy	147
Brain Tumor-SEG-CLASS [51]	T1, T1C, FLAIR	Brain Tumor	96
MGH Wild [52]	T1, T2, FLAIR	Healthy	1,110
MICA-MICs [53]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	50
MOTUM [54]	T1, T1C, T2, FLAIR	Brain Tumor	66
MS-60 [55]	T1, T2, FLAIR	Multiple Sclerosis	60
MSLesSeg [56]	T1, T2, FLAIR	Multiple Sclerosis	75
MSSEG-2 [57]	FLAIR	Multiple Sclerosis	100
MSValid [58]	T1, T2, FLAIR	Multiple Sclerosis	84
NFBS [59]	T1	Psychiatric Disorders, Healthy	125
NIMH-Ketamine [60]	T1, T2, DWI/DTI, fMRI	Psychiatric Disorders, Healthy	58
NIMH-RV [61], [62]	T1, T2, DTI, FLAIR, Others	Healthy	1,859
Novosibirsk-Brain Tumor [63]	T2, FLAIR, DWI/DTI, Others	Brain Tumor	42
OASIS-1 [64]	T1	Neurodegenerative	416
OASIS-2 [65]	T1	Neurodegenerative	150
PPMI [66]	CT, fMRI, MRI, DTI, PET, SPECT	Neurodegenerative, Healthy	8,765
QIN-BRAIN-DSC-MRI [67]	T1, DSC	Brain Tumor	49
ReMIND [68]	T1, T1C, T2, FLAIR, DWI/DTI, iUS	Brain Tumor	114
SOOP [69]	T1, T2, FLAIR, TRACE, ADC	Stroke	1,669
UCLA [70], [71]	T1, DWI/DTI, fMRI/rs-fMRI	Psychiatric Disorders, Healthy	272
UCSF-ALPTDG [72]	FLAIR, T1, T1C, T2	Brain Tumor	298
UCSF-BMSR [73]	T1, T1C, FLAIR	Brain Tumor	412
UCSF-PDGM [74]	T1, T1C, T2, FLAIR, DWI/DTI, Others	Brain Tumor	495
UKBioBank [75]	T1, FLAIR, DWI/DTI, fMRI/rs-fMRI	Multiple Diseases	500,000
UMF-PD [76]	T1, fMRI/rs-fMRI	Neurodegenerative, Healthy	83
UPENN-GBM [77], [78]	T1, T1C, T2, FLAIR, Others	Brain Tumor	630
WMH [79]	T1, FLAIR	White Matter Hyperintensities	170

2.1.0.3 Subset for Image-level Analysis.

Due to licensing restrictions and regional access limitations, only a portion of the identified datasets could be downloaded for direct inspection. To avoid redundancy, we excluded benchmark collections that merely aggregate scans from other public sources, retaining only the original datasets. The subset used for image-level profiling includes: MSLesSeg [56], MS-60 [55], MSSEG-2 [57], BraTS25-MET [35], BraTS25-SSA [36], [37], BraTS25-MEN [32]–[34], ISLES22 [48], EPISURG [41], OASIS-1 [64], OASIS-2 [65], IXI [49], UMF-PD [76], NFBS [59], and BrainMetShare [30].

2.2 Metadata Extraction↩︎

To enable consistent cross-dataset analysis, we programmatically loaded each image file and extracted key metadata. For every scan, we recorded spatial attributes (image dimensions, voxel spacing, orientation codes, affine matrix) and non-image attributes (modality, subject ID, session ID when available). Images outside the inclusion scope, such as DTI sequences in IXI, were excluded at this stage.

All extracted metadata were stored in standardized per-dataset CSV files following a uniform schema. This structured resource forms the foundation for subsequent dataset- and image-level analyses presented in this review and is designed to facilitate reproducibility and reuse by the wider community.

3 Dataset-Level Analysis↩︎

3.1 Disease Coverage↩︎

The disease distribution analysis shown in Figure 1 reveals a pronounced imbalance across public brain MRI datasets. After separating combined cohort labels and removing the undefined “Multiple Diseases” category, Healthy subjects form the largest group, followed by Neurodegenerative disorders (approximately 8,800 subjects) and Brain Tumors (around 8,400 subjects). Medium-scale categories include Stroke (2,300 subjects), Autism (2,200 subjects), and Epilepsy (870 subjects). Smaller datasets correspond to Psychiatric Disorders (455 subjects), Multiple Sclerosis (319 subjects), and White Matter Hyperintensities (170 subjects).

This distribution highlights the structural bias of the open neuroimaging landscape. The abundance of healthy and neurodegenerative cohorts reflects the historical focus on population-based and aging studies, while chronic, diffuse, or subtle pathologies remain underrepresented. Despite the diversity of available datasets, the dominance of a few diagnostic categories implies that current public MRI data cannot fully capture the clinical heterogeneity of the brain. This skewed representation constrains comparative analysis across disease types and may perpetuate overrepresentation of high-resource conditions in future benchmarks.

For foundation models, the imbalance in disease coverage directly influences representational learning. Pretraining dominated by T1-weighted healthy and Alzheimer’s data encourages the model to learn structural regularities and global contrast variations, while subtle lesion characteristics typical of demyelinating or vascular diseases remain statistically rare. Such bias limits transferability to small-lesion or microstructural disorders. To mitigate this, pretraining datasets should deliberately balance disease composition, incorporate underrepresented conditions (e.g., MS, WMH, psychiatric disorders), and include healthy scans primarily as anatomical anchors. Transparent reporting of disease proportions is essential for understanding bias propagation during large-scale pretraining.

Figure 1: Distribution of subjects by disease category after removing the undefined “Multiple Diseases” group. The x-axis uses a logarithmic scale to enable visualization across several orders of magnitude, from hundreds to tens of thousands of subjects.

3.2 Dataset Scale↩︎

The analysis of dataset sizes (Figure 2) exposes an extreme imbalance in the public brain MRI landscape. A single dataset—UKBioBank—accounts for more than 500,000 subjects, while nearly all other datasets range from a few dozen to a few thousand participants. Yet when examined alongside disease coverage, the relationship between scale and content becomes more revealing: the largest datasets are almost exclusively composed of healthy or aging populations, whereas smaller datasets concentrate on specific pathologies such as brain tumors, stroke, and multiple sclerosis. In other words, data abundance is inversely correlated with clinical complexity.

For foundation models, the insight from this scale–disease relationship is profound. Pretraining must not simply accumulate images—it must balance information density against population scale. Large healthy datasets can anchor the model’s low-level feature representation, but meaningful generalization arises only when smaller, heterogeneous clinical datasets are interleaved to inject structural variability and abnormal morphology. The optimal training corpus is therefore not the largest one, but the one that combines datasets across scales and disease domains in a way that maximizes representational complementarity.

When merging datasets, several considerations follow:

Sampling balance: naive aggregation will cause population-scale datasets to dominate optimization; adaptive weighting or stratified sampling is necessary to preserve rare clinical features.
Harmonization: resolution, voxel spacing, and intensity normalization must be aligned to prevent the model from interpreting acquisition differences as anatomical variations.
Domain alignment: cross-dataset normalization in feature space (e.g., domain-adversarial training or latent alignment) can reduce the domain gap between healthy and disease cohorts.

The scale analysis reveals that the most informative foundation model will not come from the largest dataset, but from the strategic fusion of small, diverse datasets with large, stable ones. Quantity establishes the foundation; diversity defines intelligence. A model pretrained under this philosophy learns both the invariant anatomy of the healthy brain and the variable morphology of disease, achieving robustness not through volume, but through representational balance.

Figure 2: Distribution of dataset sizes on a logarithmic scale. The figure highlights the dominance of one extremely large population dataset and numerous smaller, clinically focused cohorts. Logarithmic scaling compresses large numerical differences to emphasize structural imbalance across dataset scales.

3.3 Modality Composition↩︎

The modality co-occurrence analysis (Figure 3) reveals distinct pairing patterns among structural MRI sequences across public datasets. The most frequent combination is between T1-weighted and FLAIR scans, followed by T1–T2 and T1–T1c pairs. These sequences commonly co-occur within multi-contrast structural datasets such as BraTS, ADNI, and MSSEG, where complementary contrasts are used to capture both anatomical boundaries and pathological hyperintensities. Moderate co-occurrence is also observed among FLAIR, T2, and T1c, indicating a tendency for lesion-focused studies to integrate multiple structural contrasts that highlight different tissue characteristics. In contrast, single-modality datasets remain prevalent, particularly among population studies (e.g., IXI, OASIS), which provide only T1-weighted scans.

This co-occurrence pattern demonstrates that public brain MRI datasets—though diverse—are structurally interlinked through a limited but consistent set of core modalities. The strong correlation between T1 and FLAIR availability reflects a shared acquisition strategy for anatomical delineation and lesion sensitivity, while the partial inclusion of T2 and T1c indicates dataset-specific clinical emphasis (e.g., edema or contrast enhancement). The heatmap also reveals that cross-dataset modality overlap is incomplete: no single dataset provides full structural coverage, and different combinations dominate different disease domains. This partial alignment introduces redundancy in some modalities but gaps in others when datasets are combined.

For foundation models trained on aggregated public datasets, these co-occurrence dynamics carry important consequences. The uneven intersection of modalities across datasets means that multi-contrast information is not uniformly available for all subjects. This heterogeneity can lead to modality imbalance during pretraining and complicate cross-dataset harmonization. To address this, foundation models must incorporate modality-aware mechanisms—such as learned modality embeddings or masked reconstruction objectives—that can leverage overlapping contrasts while remaining robust to missing ones. The observed co-occurrence structure also suggests that structural modalities share sufficient anatomical redundancy to enable joint representation learning: by training across datasets with partially overlapping contrasts (e.g., T1+FLAIR from one source, T1+T2 from another), the model can implicitly learn a unified structural feature space that generalizes across acquisition protocols. Consequently, modality co-occurrence is not merely a dataset property but a key enabler of scalable, harmonized pretraining across heterogeneous MRI corpora.

Figure 3: Heatmap of modality co-occurrence across structural MRI datasets. High-intensity cells indicate frequent pairing between modalities, particularly T1–FLAIR, T1–T2, and T1–T1c. These patterns reveal partial but consistent overlap that supports unified representation learning across multi-dataset collections.

4 Image-Level Analysis↩︎

At the image level, heterogeneity in voxel geometry, orientation, and intensity introduces latent biases that can substantially affect representation learning. These properties define the physical scale, spatial consistency, and dynamic range of brain MRI data—factors that determine whether a foundation model learns anatomical invariants or dataset-specific artifacts. Our image-level analysis quantifies these factors across 14 public datasets and provides interpretative insights for model design and harmonization.

4.1 Voxel Spacing↩︎

Voxel spacing defines the physical size of each voxel along the \(x\), \(y\), and \(z\) axes in millimeters, determining how finely anatomical structures are represented in the image and directly influencing the learning behavior of foundation models. When voxel spacing varies across datasets, the same convolution or attention kernel covers different physical regions, leading to inconsistent representation of anatomical details, blurred or missing small lesions in thicker slices, and domain shifts when combining data. This makes voxel spacing not just a technical aspect of MRI acquisition but a key factor that shapes model generalization. It affects architectures differently: CNNs may learn biased features when scale changes, transformers can misalign patches or positional encodings, and SAM-style models often lose boundary accuracy when slices are uneven—making anisotropy a hidden source of error that limits transferability.

Figure 4: Voxel spacing distribution (in mm) along the x, y, and z axes for 14 curated datasets. Each point represents one scan, and each color corresponds to a dataset. Compact clusters indicate consistent acquisition protocols, while spread-out points show variation in resolution and anisotropy. — Figure 4: Voxel spacing distribution (in mm) along the \(x\), \(y\), and \(z\) axes for 14 curated datasets. Each point represents one scan, and each color corresponds to a dataset. Compact clusters indicate consistent acquisition protocols, while spread-out points show variation in resolution and anisotropy.

Figure 4 shows the 3D distribution of voxel spacings across 14 representative datasets. Most datasets cluster near isotropic spacing around \((1.0, 1.0, 1.0)\) mm, indicating uniform resolution across all axes. The three BraTS collections (BraTS-MET, BraTS-SSA, BraTS-MEN), OASIS-1/2, NFBS, and IXI fall into this group, providing consistent high-quality data for model pretraining. In contrast, multiple sclerosis datasets (MS-60, MSLesSeg, MSSEG-2) and BrainMetShare exhibit moderate anisotropy, with fine in-plane resolution (\(x,y \approx 0.8{-}1.0\) mm) but thicker slices along the \(z\)-axis (\(1.5{-}3.0\) mm). This reduces sensitivity to small or thin lesions that appear across only one or two slices. Stroke and surgical datasets, such as ISLES22 and EPISURG, show the widest variability, including cases with very thick slices (\(z>4\) mm) and variable in-plane spacing up to \(2.0\) mm. Such heterogeneity reflects differences in acquisition protocols across centers and scanners. Finally, mixed clinical datasets like UMFPD and BrainMetShare include both near-isotropic and anisotropic scans, representing real-world diversity in clinical imaging practices. These observations lead to three key insights that have direct implications for the development of foundation models: (i) Many research datasets share near-isotropic resolution and are well-suited for standardized pretraining; (ii) Clinical and disease-specific datasets tend to be anisotropic, introducing geometric inconsistencies that require explicit modeling; and (iii) spacing variability alone can cause measurable distribution shifts between datasets, even after resampling.

To further characterize these differences, we grouped each image into three categories based on the degree of anisotropy. We computed, for each image, the ratio between the largest and smallest spacing values among the three axes. If all spacings were equal (ratio = 1.0), the image was labeled as isotropic. If the ratio was greater than 1.0 but less than 2.0, it was labeled mildly anisotropic. Ratios of 2.0 or higher were labeled highly anisotropic.

Table 2: Counts of anisotropy categories across all analyzed images.
Anisotropy Category	Count
Isotropic	7,968
Mildly Anisotropic	7,152
Highly Anisotropic	1,724

As shown in Table 2, most images fall into the isotropic or mildly anisotropic categories—approximately 7,968 and 7,152 images, respectively. However, over 1,700 images are highly anisotropic, indicating substantial geometric distortion, especially in slice thickness. If left uncorrected, these differences can lead to biased model learning and performance degradation across datasets.

4.2 Orientation↩︎

The orientation of MRI volumes defines how the anatomical axes of the brain are mapped to the voxel coordinate system. Each MRI scan stores its orientation using a three-letter code (e.g., RAS, LAS, LPS), which specifies the direction of the x, y, and z axes relative to the patient’s anatomy. While orientation may appear as a technical metadata field, it has a direct and critical influence on the learning behavior of foundation models. When images are stored in inconsistent orientations across datasets, identical brain structures appear in different spatial locations or mirrored configurations. This leads to misalignment in anatomical correspondences, causing the model to learn orientation-specific patterns instead of generalizable anatomical features. Therefore, harmonizing orientation is essential for foundation models to learn consistent spatial representations that can generalize across diverse datasets.

Table 3 summarizes the orientation distribution across datasets. The most common orientation is RAS (6,592 images), which is the standard convention in neuroimaging software such as FSL and FreeSurfer. However, a considerable number of datasets adopt alternative conventions, including LPS (5,012 images) and LAS (3,473 images). These three orientations together account for over 90% of all images analyzed. Notably, several datasets contain multiple orientations internally—for instance, BraTS-MET and EPISURG each include images in both RAS and LPS forms. Less frequent orientations such as RSA, PSR, or ASL are observed in smaller datasets (e.g., OASIS, NFBS, UMFPD). The presence of such variability reflects the absence of a unified orientation policy among dataset providers, even within well-curated public repositories.

Table 3: Axial orientation distribution and dataset sources. Each axcode string represents the anatomical direction of the image axes: the first letter indicates the direction of the X-axis (e.g., R = Right, L = Left), the second letter corresponds to the Y-axis (e.g., A = Anterior, P = Posterior), and the third letter represents the Z-axis (e.g., S = Superior, I = Inferior). For example, RAS means the X-axis increases from left to right, the Y-axis from posterior to anterior, and the Z-axis from inferior to superior—commonly used in neuroimaging.
Orientation	Count	Datasets
RAS	6,592	BraTS-MET (2025), BraTS-SSA (2025), BrainMetShare, EPISURG, ISLES22
LPS	5,012	BraTS-MEN (2025), BraTS-MET (2025), BraTS-SSA (2025)
LAS	3,473	BraTS-MET (2025), EPISURG, ISLES22, IXI, MS-60, MSLesSeg, MSSEG-2, OASIS-1, UMFPD
RSA	664	EPISURG
PSR	582	EPISURG, IXI
ASL	373	OASIS-2
PIR	129	EPISURG, NFBS
LIP	9	EPISURG
LSP	5	EPISURG
ASR	4	EPISURG
LSA	1	EPISURG

The observed orientation heterogeneity introduces a subtle but significant source of distributional shift that can impair model transferability. Models trained on mixed-orientation data without explicit normalization may implicitly encode orientation-specific spatial priors. For example, left–right inversions between RAS and LAS orientations can confuse the model’s learned feature alignment, leading to inconsistent activation patterns for homologous brain regions. Similarly, inconsistent superior–inferior axis definitions can distort 3D spatial context, reducing the model’s ability to capture global anatomical symmetry.

For foundation model pretraining, these inconsistencies compound across large-scale datasets. Since pretraining relies on learning generic spatial and structural representations, uncorrected orientation differences can fragment the learned latent space, -0 causing the model to associate the same anatomy with distinct feature embeddings depending on orientation. This weakens the universality of learned representations and increases the burden on fine-tuning.

Hence, orientation harmonization is not merely a preprocessing detail but a foundational requirement for effective cross-dataset learning. Converting all volumes to a common convention (typically RAS) before model training ensures that spatial relationships are consistent across datasets. For large-scale pretraining pipelines, we recommend enforcing explicit orientation standardization as part of dataset ingestion. Such harmonization minimizes unnecessary domain shifts, allowing the foundation model to focus on learning biologically meaningful anatomy rather than orientation artifacts.

4.3 Image Intensity Distribution↩︎

Image intensity represents the voxel-wise signal values within MRI scans and encapsulates the physical properties of tissues as captured by different imaging sequences. Intensity distributions are shaped by scanner hardware, acquisition protocols, and post-processing pipelines such as bias-field correction or intensity normalization. For foundation models, which depend on large-scale data aggregation from diverse sources, inconsistent intensity scaling or contrast profiles can substantially affect representation learning. A model trained on non-harmonized intensity profiles may implicitly overfit to dataset-specific brightness ranges, thereby reducing its ability to generalize across unseen domains.

Figure 5: Median voxel intensity per image across datasets. Each dot represents one 3D MRI volume.

Figure 5 illustrates the distribution of median voxel intensities across representative datasets. Datasets such as EPISURG, OASIS-1, OASIS-2, and IXI exhibit wide intensity variability, whereas others (e.g., the BraTS series, ISLES22, MSLesSeg, and BrainMetShare) show lower and more stable median values. This disparity likely arises from differences in scanner calibration, rescaling conventions (e.g., 0–255 versus z-scored), and preprocessing intensity normalization methods. The OASIS datasets, for example, show extensive dispersion with median intensities exceeding 300, reflecting a broad dynamic range and the absence of uniform scaling. In contrast, the BraTS and MS-related datasets exhibit tight clusters around zero, suggesting that bias correction and standardized normalization were consistently applied.

These differences have several implications for foundation model development. First, heterogeneous intensity distributions introduce latent biases that may lead a model to associate tissue contrast with dataset identity rather than underlying anatomy. This undermines the objective of learning scanner- and modality-invariant representations. Second, extreme intensity outliers—particularly in datasets with mixed acquisition conditions—can destabilize loss optimization during pretraining by distorting the input statistics used by normalization layers. Conversely, datasets with highly standardized intensity ranges, while beneficial for stable convergence, may limit the model’s exposure to real-world variability and thus reduce robustness during fine-tuning on unnormalized clinical data.

From a model design perspective, these findings highlight the importance of preprocessing-aware normalization strategies. Dynamic intensity scaling or adaptive histogram alignment could be implemented within the data loading pipeline to ensure consistent contrast across datasets. Alternatively, self-supervised objectives that promote intensity-invariant representations (e.g., histogram-matching augmentations or contrast consistency losses) may help the model decouple anatomical features from brightness variations. Ultimately, balancing intensity harmonization for stable training with sufficient distributional diversity for adaptability remains a key challenge for developing robust and generalizable MRI foundation models.

To quantitatively assess whether these intensity differences are statistically significant, we applied the Kruskal–Wallis H test to the per-image median values grouped by dataset. The result was highly significant (\(H = 15093.849\), \(p < 0.0001\)), confirming that the observed inter-dataset variations are not due to random fluctuation. This non-parametric test evaluates whether at least one group differs in median from the others, without assuming a specific underlying distribution. The extremely low \(p\)-value supports the visual findings in Figure 5, indicating that intensity scaling differences across datasets are real, systematic, and substantial.

5 Evaluation of Preprocessing Effects on Image Harmonization↩︎

To systematically evaluate the impact of preprocessing on data harmonization, we randomly sampled images from the curated datasets and applied a standardized pipeline comprising bias-field correction, intensity normalization, skull stripping, and spatial registration. The resulting images were analyzed through voxel-wise statistical comparisons and qualitative visual inspection to assess improvements in inter-dataset consistency and anatomical fidelity.

5.1 Intensity Normalization↩︎

Intensity normalization is the process of adjusting MRI voxel values to a common scale so that images from different scanners or subjects become comparable. The most common techniques include z-score normalization, histogram matching, and WhiteStripe normalization. Z-score normalization rescales each image to have zero mean and unit variance, reducing intensity range differences; it is best used as a simple, general method when datasets are diverse or lack a consistent reference. Histogram matching aligns the intensity distribution of each image to that of a reference scan or template, making it ideal for multi-site datasets with large scanner or protocol variability. WhiteStripe normalization uses the intensity range of normal-appearing white matter to anchor scaling, which is most effective for brain studies where maintaining tissue contrast is important.

As summarized in Table 4, the original voxel intensities span a wide range, reflecting strong contrast between bright enhancement regions and darker tissues. After applying z-score normalization, the intensity distribution becomes centered around zero with reduced variance, resulting in a more uniform and balanced appearance across tissues. However, this transformation also alters the visual contrast, as shown in Figure 6: some brain regions appear brighter, while fine structural details become less pronounced. This effect occurs because z-score normalization rescales voxel values relative to the global mean and standard deviation, thereby compressing the overall dynamic range and reducing intensity extremes.

When building foundation models, intensity normalization should be applied consistently across all datasets to prevent artificial domain shifts. The chosen method must preserve relative tissue contrast while harmonizing global intensity ranges. It is also beneficial to expose the model to multiple normalization styles during pretraining, helping it learn invariance to contrast variations. Finally, combining preprocessing-based normalization with learnable normalization layers (e.g., instance or adaptive layer normalization) allows the model to adapt dynamically to unseen data while maintaining stable, harmonized feature representations.

Figure 6: T1C image from the BrainMetShare dataset before (left) and after z-score normalization (right).

Table 4: Voxel-level intensity statistics before and after z-score normalization.
Statistic	Original Image	Normalized Image
Minimum	0.854	0
Maximum	1259.812	8.40
Mean	258.018	\(\sim\)0.00
Std	95.000	\(\sim\)1.00

5.2 Bias Field Correction↩︎

Bias field correction adjusts MRI images to remove gradual brightness variations caused by uneven magnetic fields or coil sensitivity. These variations make some regions look brighter or darker even when the tissue is the same, so correction helps make the intensity more uniform across the brain. The popular methods include N4ITK (N4 bias field correction), N3 (nonparametric nonuniform intensity normalization), and SPM’s unified segmentation approach. In this review, we applied N4ITK bias correction with modality-specific tuning, such as adjustments included enhanced smoothing for FLAIR, brain masking for T1C images, and balanced settings for T1 and T2, using the SimpleITK implementation.

Representative examples are shown in Figure 7. The raw image (top-left) displays uneven brightness — the left side of the brain appears darker due to scanner-related field inhomogeneity. After correction (top-second), the preprocessed image shows more uniform brightness across tissue regions, while the estimated bias field map (top-third) captures the smooth multiplicative field responsible for this nonuniformity. The intensity histograms reveal that voxel intensities have shifted and become more compact, indicating reduced variation between bright and dark areas. The horizontal and vertical profiles show that peaks corresponding to white matter and gray matter are now closer in amplitude, confirming improved intensity consistency. The intensity correlation plot (r = 0.823) shows that the correction maintains overall intensity relationships but rescales them toward a more uniform distribution. Quantitatively, as shown in Table 5, the coefficient of variation decreases (0.207 → 0.163), meaning intensity variability within tissue is reduced, while the signal-to-noise ratio (SNR) remains similar (6.87 → 6.50), suggesting correction did not distort contrast or amplify noise. The difference map highlights smooth intensity shifts, with no sharp artifacts.

While bias correction helps standardize input intensities for foundation model training, its effects vary with modality, anatomy, and pathology. Overcorrection may reduce lesion contrast or introduce distortions, while undercorrection can leave scanner-specific artifacts. Hence, visual and quantitative validation is essential, particularly when aggregating multi-source data.

Figure 7: Bias field correction effect for BraTS-SSA. Includes raw image, corrected result, estimated bias field map, and applied brain mask.

Table 5: Quantitative effects of bias field correction on T2-weighted images from BraTS-SSA.
Metric (BraTS-SSA)	Before	After
Coefficient of Variation	0.207	0.163
Signal-to-Noise Ratio (SNR)	6.87	6.50

5.3 Skull Stripping↩︎

The primary goal of skull stripping is to remove non-brain tissue, such as the skull, scalp, and dura mater, from the image. This is a critical step as these tissues have high-intensity signals that can interfere with intensity normalization and confuse segmentation algorithms. Common tools include FSL’s Brain Extraction Tool (BET) [80], AFNI’s 3dSkullStrip [81], and more recently, deep learning-based methods like HD-BET [82], which often provide more accurate results. While most datasets in our analysis are provided pre-stripped (e.g., BraTS, ISLES22), the specific algorithm used often varies or is not documented, leading to subtle differences in the final brain mask. Figure 8 illustrates the effect of skull stripping on a PD image from the IXI dataset, where non-brain tissues such as the scalp and skull are successfully removed, leaving only the intracranial structures for further analysis.

From a foundation model standpoint, skull stripping can influence both pretraining and downstream transfer. When training models across multiple datasets, consistent skull stripping helps reduce non-biological variability and ensures that the model focuses on relevant brain structures. However, inconsistency across datasets—where some scans are stripped and others are not—can lead to feature-space fragmentation, causing the model to learn dataset-specific biases rather than generalizable brain representations. Therefore, strict harmonization of preprocessing pipelines, including identical skull stripping tools, thresholds, and quality-control procedures, is essential.

Moreover, the choice to strip or retain the skull should align with the model’s target scope. For models designed to capture brain-centric features—such as lesion segmentation, cortical parcellation, or morphometric analysis—skull stripping is generally beneficial, as it directs attention to intracranial tissues. Conversely, for models intended to generalize across multi-modal or multi-organ contexts (e.g., MRI–CT alignment, PET fusion, or structural-to-functional transfer), removing the skull can limit cross-modality correspondence and reduce anatomical completeness. A practical strategy for large-scale foundation model pretraining is to include both stripped and unstripped variants of each scan and use metadata tags or preprocessing embeddings to inform the model about their origin. This dual representation encourages robustness to preprocessing differences and enables the model to learn invariance to skull presence—an increasingly important capability for generalizable medical foundation models.

Figure 8: Effect of skull stripping on a PD image from the IXI dataset. Left: original image with visible scalp and skull. Right: stripped image showing improved tissue contrast and brain boundary definition, but minor over-stripping near cortical edges.

5.4 Spatial Registration to MNI152↩︎

Spatial registration aims to align MRI volumes into a common anatomical space, reducing spatial variability across datasets. Using a modality-aware ANTs pipeline with rigid–affine–SyN transformations, we aligned representative scans to the MNI152 template. This process standardizes brain geometry but also exposes how registration can reshape anatomical statistics in subtle, dataset-specific ways.

Figure 9 shows the registration effect for a T2-weighted image from BraTS-MEN. The aligned scan closely matches the MNI template, and quantitative metrics confirm high structural similarity (mutual information = 0.974, structural similarity = 0.641). However, resampling expanded the image volume by 26.9%, and local correlation (\(r = -0.217\)) indicates that voxel intensity relationships were partially altered. Overlay maps and checkerboard comparisons highlight that most deviations occur near lesion borders and ventricles—regions where pathology or intensity nonuniformity interact poorly with the template deformation.

These findings reveal an essential trade-off. Registration improves spatial consistency across datasets, supporting template-based feature extraction and patch sampling. Yet, excessive geometric forcing can distort pathological anatomy and attenuate lesion contrast, especially in heterogeneous clinical data. For foundation model pretraining, this suggests that full MNI normalization may be beneficial only for structural harmonization, while native-space training augmented with local spatial perturbations could better preserve disease-specific variability and improve cross-domain generalization.

Figure 9: Effect of registration on a T2-weighted image from the BraTS-MEN dataset. Top: original, registered, and MNI152 template images. Bottom: histogram alignment, intensity correlation (r=-0.217), and overlay comparisons showing a 26.9% volume increase. — Figure 9: Effect of registration on a T2-weighted image from the BraTS-MEN dataset. Top: original, registered, and MNI152 template images. Bottom: histogram alignment, intensity correlation (\(r=-0.217\)), and overlay comparisons showing a 26.9% volume increase.

5.5 Interpolation of Thin-Slice Volumes↩︎

Several clinical datasets, such as MS-60, contain scans with limited z-axis coverage or thick slices, producing anisotropic volumes that hinder 3D convolutional learning. To mitigate this, we applied an automated interpolation procedure that increases through-plane resolution while maintaining anatomical scale. This step is not simply geometric resampling—it directly determines how well small, low-contrast lesions are represented in 3D feature space.

Figure 10 illustrates a FLAIR image from the MS-60 dataset before and after interpolation. The original scan (13 slices) shows severe discontinuities and collapsed tissue boundaries, whereas the interpolated version (64 slices) restores smoother cortical contours and continuous sulcal structures without distorting global shape. Quantitatively, the effective slice thickness decreased by approximately 4.8\(\times\), enabling isotropic patch extraction for pretraining and consistent input dimensions across datasets.

From a foundation model perspective, interpolation functions as a structural equalizer: it harmonizes volumetric resolution across sources, improving patch uniformity and kernel receptive fields. However, it also generates synthetic voxels that may obscure very small hyperintensities or produce interpolation artifacts along lesion edges. Thus, interpolation should be applied selectively—preferably on high-anisotropy datasets or in conjunction with uncertainty-aware augmentations—to balance geometric consistency and lesion fidelity.

Figure 10: Interpolation of a FLAIR image from the MS-60 dataset. Left: original thin-slice volume (13 slices) showing discontinuities. Right: interpolated volume (64 slices) with improved z-axis continuity and preserved anatomy.

6 Residual Covariate Shift After Preprocessing↩︎

Despite standardized preprocessing, inherent heterogeneity in MRI data from diverse sources introduces residual covariate shift, significantly impeding the generalizability of deep learning models. This phenomenon manifests as subtle, non-linear variations within images, encompassing scanner-specific noise patterns, intensity distortions, and residual artifacts that standard harmonization techniques fail to fully mitigate. To empirically demonstrate this, we utilized T1-weighted MRI scans of healthy subjects from two public datasets: NFBS (125 images) and a subset of IXI (54 images). All data underwent a uniform preprocessing pipeline: skull stripping, N4 bias field correction, MNI152 template registration, and intensity normalization. For computational efficiency and to effectively capture domain differences, a single central axial slice was extracted from each image. Features (1024-dimensional) were subsequently derived from the penultimate layer of an ImageNet-pretrained DenseNet121, without fine-tuning, to assess raw feature transferability.Quantitative assessment of the domain shift between these datasets is summarized in Table 6.

Despite a high cosine similarity, indicative of similar vector directions, a substantial Euclidean distance and average Wasserstein distance highlight significant shifts in the magnitude and distribution of features. Statistical analysis further confirmed this divergence: 83.89% of all features exhibited statistically significant differences (\(p < 4.88\times10^{-5}\)) after Bonferroni correction). These findings conclusively demonstrate that standard preprocessing is insufficient for complete MRI data harmonization. The persistent residual covariate shift in the learned feature space critically impairs model robustness and transferability across unseen domains. Therefore, developing and implementing explicit domain adaptation strategies—such as disentangled representation learning, meta-learning for domain generalization, and robust uncertainty estimation—is paramount for building truly generalizable and clinically reliable models. This is particularly crucial for the advancement of foundation models in high-stakes medical imaging applications.

Table 6: Quantitative assessment of residual covariate shift in learned feature space: NFBS vs. IXI Datasets
Metric	Value
Cosine Similarity (mean vectors)	0.960907
Euclidean Distance (mean vectors)	6.854918
Average 1D Wasserstein Distance	0.141903
Significant Features (\(p < 0.05\))	947 / 1024
Bonferroni-Significant Features (\(p < 4.88 \times 10^{-5}\))	859 / 1024

7 Conclusion & Discussion↩︎

In this study, we analyzed 54 publicly available brain MRI datasets to understand how their characteristics differ across modalities, voxel geometry, and intensity distributions. We found that public datasets cover a broad range of neurological and psychiatric conditions but vary widely in scale, modality composition, and acquisition settings. Structural MRI sequences dominate the landscape, while advanced modalities such as diffusion and functional MRI are much less represented. This diversity provides valuable opportunities for comprehensive modeling but also poses challenges for developing foundation models that must generalize across many domains. To assess how preprocessing affects data harmonization, we applied a standardized pipeline including bias-field correction, intensity normalization, skull stripping, and spatial registration. These steps increased internal consistency within datasets but did not fully remove differences between them. Residual variability was evident in the feature space, indicating that standard preprocessing alone cannot ensure complete harmonization. This suggests that preprocessing-aware model design and domain adaptation techniques are needed to reduce inter-dataset shifts more effectively.

Several limitations should be acknowledged. Our study did not assess annotation quality or consistency, which strongly influences the suitability of datasets for supervised learning. We also did not perform model benchmarking, which would show how dataset variability translates to performance differences. Future work will address these aspects by analyzing annotation quality, increasing the number of samples for preprocessing evaluation, and including model benchmarking. Together, these efforts will build a stronger foundation for developing harmonized, reliable, and generalizable brain MRI foundation models.

Author Declaration↩︎

This manuscript was prepared and refined with the assistance of ChatGPT (GPT-5, OpenAI, 2025) for language enhancement and clarity.

Acknowledgement↩︎

This work was supported by a grant for research centers, provided by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement with the Novosibirsk State University dated April 17, 2025 No. 139-15-2025-006: IGK 000000C313925P3S0002

Appendix↩︎

Table 7: Standardization of imaging modalities across datasets.
Standardized Label	Original Variants Grouped
T1	Anatomical T1-weighted scans
T1c	Contrast-enhanced T1-weighted scans
DWI/DTI	DWI, DTI, ADC, TRACE
fMRI/rs-fMRI	BOLD, resting-state fMRI, task-fMRI
Others	SWI, ASL, PET, CT, MRA, MEG

Table 8: Standardization of cohort labels across datasets.
Standardized Label	Original Variants Grouped
Healthy	Healthy
Stroke	Stroke
Multiple Sclerosis	Multiple Sclerosis
Brain Tumor	Glioma, Glioblastoma, Meningioma, Metastasis
Neurodegenerative	Alzheimer’s Disease, Parkinson’s Disease
Psychiatric Disorder	Schizophrenia, Bipolar Disorder, Major Depressive Disorder (MDD)

Figure 11: Z-score intensity normalization effect, T1C from BraTS-MEN dataset.

Figure 12: Statistical summary of intensity distributions across steps for IXI dataset. Preprocessing reduces variability, especially in outlier values.

References↩︎

[1]

Valery L Feigin, Emma Nichols, Tahiya Alam, Marlena S Bannick, Ettore Beghi, Natacha Blake, William J Culpepper, E Ray Dorsey, Alexis Elbaz, Richard G Ellenbogen, James L Fisher, Christina Fitzmaurice, Giorgia Giussani, Linda Glennie, Spencer L James, Catherine Owens Johnson, Nicholas J Kassebaum, Giancarlo Logroscino, Benoît Marin, W Cliff Mountjoy-Venning, Minh Nguyen, Richard Ofori-Asenso, Anoop P Patel, Marco Piccininni, Gregory A Roth, Timothy J Steiner, Lars Jacob Stovner, Cassandra E I Szoeke, Alice Theadom, Stein Emil Vollset, Mitchell Taylor Wallin, Claire Wright, Joseph Raymond Zunt, Nooshin Abbasi, Foad Abd-Allah, Ahmed Abdelalim, Ibrahim Abdollahpour, Victor Aboyans, Haftom Niguse Abraha, Dilaram Acharya, Abdu A Adamu, Oladimeji M Adebayo, Abiodun Moshood Adeoye, Jose C Adsuar, Mohsen Afarideh, Sutapa Agrawal, Alireza Ahmadi, Muktar Beshir Ahmed, Amani Nidhal Aichour, Ibtihel Aichour, Miloud Taki Eddine Aichour, Rufus Olusola Akinyemi, Nadia Akseer, Ayman Al-Eyadhy, Rustam Al-Shahi Salman, Fares Alahdab, Kefyalew Addis Alene, Syed Mohamed Aljunid, Khalid Altirkawi, Nelson Alvis-Guzman, Nahla Hamed Anber, Carl Abelardo T Antonio, Jalal Arabloo, Olatunde Aremu, Johan Ärnlöv, Hamid Asayesh, Rana Jawad Asghar, Hagos Tasew Atalay, Ashish Awasthi, Beatriz Paulina Ayala Quintanilla, Tambe B Ayuk, Alaa Badawi, Maciej Banach, Joseph Adel Mattar Banoub, Miguel A Barboza, Suzanne Lyn Barker-Collo, Till Winfried Bärnighausen, Bernhard T Baune, Neeraj Bedi, Masoud Behzadifar, Meysam Behzadifar, Yannick Béjot, Bayu Begashaw Bekele, Abate Bekele Belachew, Derrick A Bennett, Isabela M Bensenor, Adugnaw Berhane, Mircea Beuran, Krittika Bhattacharyya, Zulfiqar A Bhutta, Belete Biadgo, Ali Bijani, Nigus Bililign, Muhammad Shahdaat Bin Sayeed, Christopher Kynrint Blazes, Carol Brayne, Zahid A Butt, Ismael R Campos-Nonato, Carlos Cantu-Brito, Mate Car, Rosario Cárdenas, Juan J Carrero, Félix Carvalho, Carlos A Castañeda-Orjuela, Franz Castro, Ferrán Catalá-López, Ester Cerin, Yazan Chaiah, Jung-Chen Chang, Irini Chatziralli, Peggy Pei-Chia Chiang, Hanne Christensen, Devasahayam J Christopher, Cyrus Cooper, Paolo Angelo Cortesi, Vera M Costa, Michael H Criqui, Christopher Stephen Crowe, Albertino Antonio Moura Damasceno, Ahmad Daryani, Vanessa De La Cruz-Góngora, Fernando Pio De La Hoz, Diego De Leo, Gebre Teklemariam Demoz, Kebede Deribe, Samath Dhamminda Dharmaratne, Daniel Diaz, Mesfin Tadese Dinberu, Shirin Djalalinia, David Teye Doku, Manisha Dubey, Eleonora Dubljanin, Eyasu Ejeta Duken, David Edvardsson, Ziad El-Khatib, Matthias Endres, Aman Yesuf Endries, Sharareh Eskandarieh, Alireza Esteghamati, Sadaf Esteghamati, Farzaneh Farhadi, Andre Faro, Farshad Farzadfar, Mohammad Hosein Farzaei, Batool Fatima, Seyed-Mohammad Fereshtehnejad, Eduarda Fernandes, Garumma Tolu Feyissa, Irina Filip, Florian Fischer, Takeshi Fukumoto, Morsaleh Ganji, Fortune Gbetoho Gankpe, Miguel A Garcia-Gordillo, Abadi Kahsu Gebre, Teklu Gebrehiwo Gebremichael, Belayneh K Gelaw, Johanna M Geleijnse, Demeke Geremew, Kebede Embaye Gezae, Maryam Ghasemi-Kasman, Mahari Y Gidey, Paramjit Singh Gill, Tiffany K Gill, Efrata Tufa Girma, Elena V Gnedovskaya, Alessandra C Goulart, Ayman Grada, Giuseppe Grosso, Yuming Guo, Rahul Gupta, Rajeev Gupta, Juanita A Haagsma, Tekleberhan B Hagos, Arvin Haj-Mirzaian, Arya Haj-Mirzaian, Randah R Hamadeh, Samer Hamidi, Graeme J Hankey, Yuantao Hao, Josep Maria Haro, Hadi Hassankhani, Hamid Yimam Hassen, Rasmus Havmoeller, Simon I Hay, Mohamed I Hegazy, Behnam Heidari, Andualem Henok, Fatemeh Heydarpour, Chi Linh Hoang, Michael K Hole, Enayatollah Homaie Rad, Seyed Mostafa Hosseini, Guoqing Hu, Ehimario U Igumbor, Olayinka Stephen Ilesanmi, Seyed Sina Naghibi Irvani, Sheikh Mohammed Shariful Islam, Mihajlo Jakovljevic, Mehdi Javanbakht, Ravi Prakash Jha, Yash B Jobanputra, Jost B Jonas, Jacek Jerzy Jozwiak, Mikk Jürisson, Amaha Kahsay, Rizwan Kalani, Yogeshwar Kalkonde, Teshome Abegaz Kamil, Tanuj Kanchan, Manoochehr Karami, André Karch, Narges Karimi, Amir Kasaeian, Tesfaye Dessale Kassa, Zemenu Yohannes Kassa, Anil Kaul, Adane Teshome Kefale, Peter Njenga Keiyoro, Yousef Saleh Khader, Morteza Abdullatif Khafaie, Ibrahim A Khalil, Ejaz Ahmad Khan, Young-Ho Khang, Habibolah Khazaie, Aliasghar A Kiadaliri, Daniel N Kiirithio, Anthony S Kim, Daniel Kim, Young-Eun Kim, Yun Jin Kim, Adnan Kisa, Yoshihiro Kokubo, Ai Koyanagi, Rita V Krishnamurthi, Barthelemy Kuate Defo, Burcu Kucuk Bicer, Manasi Kumar, Ben Lacey, Alessandra Lafranconi, Van C Lansingh, Arman Latifi, Cheru Tesema Leshargie, Shanshan Li, Yu Liao, Shai Linn, Warren David Lo, Jaifred Christian F Lopez, Stefan Lorkowski, Paulo A Lotufo, Robyn M Lucas, Raimundas Lunevicius, Mark T Mackay, Narayan Bahadur Mahotra, Marek Majdan, Reza Majdzadeh, Azeem Majeed, Reza Malekzadeh, Deborah Carvalho Malta, Navid Manafi, Mohammad Ali Mansournia, Lorenzo Giovanni Mantovani, Winfried März, Tivani Phosa Mashamba-Thompson, Benjamin Ballard Massenburg, Kedar K V Mate, Colm McAlinden, John J McGrath, Varshil Mehta, Toni Meier, Hagazi Gebre Meles, Addisu Melese, Peter T N Memiah, Ziad A Memish, Walter Mendoza, Desalegn Tadese Mengistu, Getnet Mengistu, Atte Meretoja, Tuomo J Meretoja, Tomislav Mestrovic, Bartosz Miazgowski, Tomasz Miazgowski, Ted R Miller, Gk Mini, Erkin M Mirrakhimov, Babak Moazen, Bahram Mohajer, Naser Mohammad Gholi Mezerji, Moslem Mohammadi, Maryam Mohammadi-Khanaposhtani, Roghayeh Mohammadibakhsh, Mousa Mohammadnia-Afrouzi, Shafiu Mohammed, Farnam Mohebi, Ali H Mokdad, Lorenzo Monasta, Stefania Mondello, Yoshan Moodley, Mahmood Moosazadeh, Ghobad Moradi, Maziar Moradi-Lakeh, Mehdi Moradinazar, Paula Moraga, Ilais Moreno Velásquez, Shane Douglas Morrison, Seyyed Meysam Mousavi, Oumer Sada Muhammed, Walter Muruet, Kamarul Imran Musa, Ghulam Mustafa, Mehdi Naderi, Gabriele Nagel, Aliya Naheed, Gurudatta Naik, Farid Najafi, Vinay Nangia, Ionut Negoi, Ruxandra Irina Negoi, Charles Richard James Newton, Josephine W Ngunjiri, Cuong Tat Nguyen, Long Hoang Nguyen, Dina Nur Anggraini Ningrum, Yirga Legesse Nirayo, Molly R Nixon, Bo Norrving, Jean Jacques Noubiap, Malihe Nourollahpour Shiadeh, Peter S Nyasulu, Okechukwu Samuel Ogah, In-Hwan Oh, Andrew T Olagunju, Tinuke O Olagunju, Pedro R Olivares, Obinna E Onwujekwe, Eyal Oren, Mayowa Ojo Owolabi, Mahesh Pa, Amir H Pakpour, Wen-Harn Pan, Songhomitra Panda-Jonas, Jeyaraj Durai Pandian, Sangram Kishor Patel, David M Pereira, Max Petzold, Julian David Pillay, Michael A Piradov, Guilherme V Polanczyk, Suzanne Polinder, Maarten J Postma, Richie Poulton, Hossein Poustchi, Swayam Prakash, V Prakash, Mostafa Qorbani, Amir Radfar, Anwar Rafay, Alireza Rafiei, Fakher Rahim, Vafa Rahimi-Movaghar, Mahfuzar Rahman, Mohammad Hifz Ur Rahman, Muhammad Aziz Rahman, Fatemeh Rajati, Usha Ram, Anna Ranta, David Laith Rawaf, Salman Rawaf, Nickolas Reinig, Cesar Reis, Andre M N Renzaho, Serge Resnikoff, Shahab Rezaeian, Mohammad Sadegh Rezai, Carlos Miguel Rios González, Nicholas L S Roberts, Leonardo Roever, Luca Ronfani, Elias Merdassa Roro, Gholamreza Roshandel, Ali Rostami, Parisa Sabbagh, Ralph L Sacco, Perminder S Sachdev, Basema Saddik, Hosein Safari, Roya Safari-Faramani, Sare Safi, Saeid Safiri, Rajesh Sagar, Ramesh Sahathevan, Amirhossein Sahebkar, Mohammad Ali Sahraian, Payman Salamati, Saleh Salehi Zahabi, Yahya Salimi, Abdallah M Samy, Juan Sanabria, Itamar S Santos, Milena M Santric Milicevic, Nizal Sarrafzadegan, Benn Sartorius, Shahabeddin Sarvi, Brijesh Sathian, Maheswar Satpathy, Arundhati R Sawant, Monika Sawhney, Ione J C Schneider, Ben Schöttker, David C Schwebel, Soraya Seedat, Sadaf G Sepanlou, Hosein Shabaninejad, Azadeh Shafieesabet, Masood Ali Shaikh, Raad A Shakir, Mehran Shams-Beyranvand, Morteza Shamsizadeh, Mehdi Sharif, Mahdi Sharif-Alhoseini, Jun She, Aziz Sheikh, Kevin N Sheth, Mika Shigematsu, Rahman Shiri, Reza Shirkoohi, Ivy Shiue, Soraya Siabani, Tariq J Siddiqi, Inga Dora Sigfusdottir, Rannveig Sigurvinsdottir, Donald H Silberberg, João Pedro Silva, Dayane Gabriele Alves Silveira, Jasvinder A Singh, Dhirendra Narain Sinha, Eirini Skiadaresi, Mari Smith, Badr Hasan Sobaih, Soheila Sobhani, Moslem Soofi, Ireneous N Soyiri, Luciano A Sposato, Dan J Stein, Murray B Stein, Mark A Stokes, Mu’awiyyah Babale Sufiyan, Bryan L Sykes, Pn Sylaja, Rafael Tabarés-Seisdedos, Braden James Te Ao, Arash Tehrani-Banihashemi, Mohamad-Hani Temsah, Omar Temsah, Jarnail Singh Thakur, Amanda G Thrift, Roman Topor-Madry, Miguel Tortajada-Girbés, Marcos Roberto Tovani-Palone, Bach Xuan Tran, Khanh Bao Tran, Thomas Clement Truelsen, Afewerki Gebremeskel Tsadik, Lorainne Tudor Car, Kingsley Nnanna Ukwaja, Irfan Ullah, Muhammad Shariq Usman, Olalekan A Uthman, Pascual R Valdez, Tommi Juhani Vasankari, Rajagopalan Vasanthan, Yousef Veisani, Narayanaswamy Venketasubramanian, Francesco S Violante, Vasily Vlassov, Kia Vosoughi, Giang Thu Vu, Isidora S Vujcic, Fasil Shiferaw Wagnew, Yasir Waheed, Yuan-Pang Wang, Elisabete Weiderpass, Jordan Weiss, Harvey A Whiteford, Tissa Wijeratne, Andrea Sylvia Winkler, Charles Shey Wiysonge, Charles D A Wolfe, Gelin Xu, Ali Yadollahpour, Tomohide Yamada, Yuichiro Yano, Mehdi Yaseri, Hiroshi Yatsuya, Ebrahim M Yimer, Paul Yip, Engida Yisma, Naohiro Yonemoto, Mahmoud Yousefifard, Chuanhua Yu, Zoubida Zaidi, Sojib Bin Zaman, Mohammad Zamani, Hamed Zandian, Zohreh Zare, Yunquan Zhang, Sanjay Zodpey, Mohsen Naghavi, Christopher J L Murray, and Theo Vos. Global, regional, and national burden of neurological disorders, 1990–2016: a systematic analysis for the GlobalBurden of DiseaseStudy 2016. The Lancet Neurology, 18(5):459–480, May 2019.

[2]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. On the Opportunities and Risks of FoundationModels, 2021. Version Number: 3.

[3]

Bobby Azad, Reza Azad, Sania Eskandari, Afshin Bozorgpour, Amirhossein Kazerouni, Islem Rekik, and Dorit Merhof. Foundational Models in MedicalImaging: AComprehensiveSurvey and FutureVision, October 2023. arXiv:2310.18689 [cs].

[4]

Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. Foundation Model for AdvancingHealthcare: Challenges, Opportunities, and FutureDirections, April 2024. arXiv:2404.03264 [cs].

[5]

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nature Communications, 15(1):654, January 2024.

[6]

Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthews, Chuyang Ye, and Wenjia Bai. A FoundationModel for BrainLesionSegmentation with Mixture of ModalityExperts. In Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, and Julia A. Schnabel, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, volume 15012, pages 379–389. Springer Nature Switzerland, Cham, 2024. Series Title: Lecture Notes in Computer Science.

[7]

Joseph Cox, Peng Liu, Skylar E. Stolte, Yunchao Yang, Kang Liu, Kyle B. See, Huiwen Ju, and Ruogu Fang. : Towards3DFoundationModels for NeuroimageSegmentation, 2024. Version Number: 3.

[8]

Zeki Kuş and Musa Aydin. : A comprehensive benchmark for medical image segmentation in diverse data modalities. Scientific Data, 11(1):1283, November 2024.

[9]

Katharine A. Dishner, Bala McRae‐Posani, Arka Bhowmik, Maxine S. Jochelson, Andrei Holodny, Katja Pinker, Sarah Eskreis‐Winkler, and Joseph N. Stember. A Survey of PubliclyAvailable<span style="font-variant:small-caps;">MRI</span>Datasets for PotentialUse in ArtificialIntelligenceResearch. Journal of Magnetic Resonance Imaging, 59(2):450–480, February 2024.

[10]

Raymond Pomponio, Guray Erus, Mohamad Habes, Jimit Doshi, Dhivya Srinivasan, Elizabeth Mamourian, Vishnu Bashyam, Ilya M. Nasrallah, Theodore D. Satterthwaite, Yong Fan, Lenore J. Launer, Colin L. Masters, Paul Maruff, Chuanjun Zhuo, Henry Völzke, Sterling C. Johnson, Jurgen Fripp, Nikolaos Koutsouleris, Daniel H. Wolf, Raquel Gur, Ruben Gur, John Morris, Marilyn S. Albert, Hans J. Grabe, Susan M. Resnick, R. Nick Bryan, David A. Wolk, Russell T. Shinohara, Haochang Shou, and Christos Davatzikos. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage, 208:116450, March 2020.

[11]

Alexander G. Yearley, Julian Bryan Iorgulescu, Ennio Antonio Chiocca, Pier Paolo Peruzzi, Timothy R. Smith, David A. Reardon, and Michael A. Mooney. The current state of glioma data registries. Neuro-Oncology Advances, 4(1):vdac099, 2022.

[12]

Meryem Abbad Andaloussi, Raphael Maser, Frank Hertel, François Lamoline, and Andreas Dominik Husch. Exploring AdultGlioma through MRI: AReview of PubliclyAvailableDatasets to GuideEfficientImageAnalysis, 2024. Version Number: 2.

[13]

Bjoern H. Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, Levente Lanczi, Elizabeth Gerstner, Marc-Andre Weber, Tal Arbel, Brian B. Avants, Nicholas Ayache, Patricia Buendia, D. Louis Collins, Nicolas Cordier, Jason J. Corso, Antonio Criminisi, Tilak Das, Herve Delingette, Cagatay Demiralp, Christopher R. Durst, Michel Dojat, Senan Doyle, Joana Festa, Florence Forbes, Ezequiel Geremia, Ben Glocker, Polina Golland, Xiaotao Guo, Andac Hamamci, Khan M. Iftekharuddin, Raj Jena, Nigel M. John, Ender Konukoglu, Danial Lashkari, Jose Antonio Mariz, Raphael Meier, Sergio Pereira, Doina Precup, Stephen J. Price, Tammy Riklin Raviv, Syed M. S. Reza, Michael Ryan, Duygu Sarikaya, Lawrence Schwartz, Hoo-Chang Shin, Jamie Shotton, Carlos A. Silva, Nuno Sousa, Nagesh K. Subbanna, Gabor Szekely, Thomas J. Taylor, Owen M. Thomas, Nicholas J. Tustison, Gozde Unal, Flor Vasseur, Max Wintermark, Dong Hye Ye, Liang Zhao, Binsheng Zhao, Darko Zikic, Marcel Prastawa, Mauricio Reyes, and Koen Van Leemput. The MultimodalBrainTumorImageSegmentationBenchmark(BRATS). IEEE Transactions on Medical Imaging, 34(10):1993–2024, October 2015.

[14]

Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc-Andre Weber, Abhishek Mahajan, Ujjwal Baid, Elizabeth Gerstner, Dongjin Kwon, Gagan Acharya, Manu Agarwal, Mahbubul Alam, Alberto Albiol, Antonio Albiol, Francisco J. Albiol, Varghese Alex, Nigel Allinson, Pedro H. A. Amorim, Abhijit Amrutkar, Ganesh Anand, Simon Andermatt, Tal Arbel, Pablo Arbelaez, Aaron Avery, Muneeza Azmat, Pranjal B., W Bai, Subhashis Banerjee, Bill Barth, Thomas Batchelder, Kayhan Batmanghelich, Enzo Battistella, Andrew Beers, Mikhail Belyaev, Martin Bendszus, Eze Benson, Jose Bernal, Halandur Nagaraja Bharath, George Biros, Sotirios Bisdas, James Brown, Mariano Cabezas, Shilei Cao, Jorge M. Cardoso, Eric N Carver, Adrià Casamitjana, Laura Silvana Castillo, Marcel Catà, Philippe Cattin, Albert Cerigues, Vinicius S. Chagas, Siddhartha Chandra, Yi-Ju Chang, Shiyu Chang, Ken Chang, Joseph Chazalon, Shengcong Chen, Wei Chen, Jefferson W Chen, Zhaolin Chen, Kun Cheng, Ahana Roy Choudhury, Roger Chylla, Albert Clérigues, Steven Colleman, Ramiro German Rodriguez Colmeiro, Marc Combalia, Anthony Costa, Xiaomeng Cui, Zhenzhen Dai, Lutao Dai, Laura Alexandra Daza, Eric Deutsch, Changxing Ding, Chao Dong, Shidu Dong, Wojciech Dudzik, Zach Eaton-Rosen, Gary Egan, Guilherme Escudero, Théo Estienne, Richard Everson, Jonathan Fabrizio, Yong Fan, Longwei Fang, Xue Feng, Enzo Ferrante, Lucas Fidon, Martin Fischer, Andrew P. French, Naomi Fridman, Huan Fu, David Fuentes, Yaozong Gao, Evan Gates, David Gering, Amir Gholami, Willi Gierke, Ben Glocker, Mingming Gong, Sandra González-Villá, T. Grosges, Yuanfang Guan, Sheng Guo, Sudeep Gupta, Woo-Sup Han, Il Song Han, Konstantin Harmuth, Huiguang He, Aura Hernández-Sabaté, Evelyn Herrmann, Naveen Himthani, Winston Hsu, Cheyu Hsu, Xiaojun Hu, Xiaobin Hu, Yan Hu, Yifan Hu, Rui Hua, Teng-Yi Huang, Weilin Huang, Sabine Van Huffel, Quan Huo, Vivek HV, Khan M. Iftekharuddin, Fabian Isensee, Mobarakol Islam, Aaron S. Jackson, Sachin R. Jambawalikar, Andrew Jesson, Weijian Jian, Peter Jin, V Jeya Maria Jose, Alain Jungo, B Kainz, Konstantinos Kamnitsas, Po-Yu Kao, Ayush Karnawat, Thomas Kellermeier, Adel Kermi, Kurt Keutzer, Mohamed Tarek Khadir, Mahendra Khened, Philipp Kickingereder, Geena Kim, Nik King, Haley Knapp, Urspeter Knecht, Lisa Kohli, Deren Kong, Xiangmao Kong, Simon Koppers, Avinash Kori, Ganapathy Krishnamurthi, Egor Krivov, Piyush Kumar, Kaisar Kushibar, Dmitrii Lachinov, Tryphon Lambrou, Joon Lee, Chengen Lee, Yuehchou Lee, M Lee, Szidonia Lefkovits, Laszlo Lefkovits, James Levitt, Tengfei Li, Hongwei Li, Wenqi Li, Hongyang Li, Xiaochuan Li, Yuexiang Li, Heng Li, Zhenye Li, Xiaoyu Li, Zeju Li, XiaoGang Li, Zheng-Shen Lin, Fengming Lin, Pietro Lio, Chang Liu, Boqiang Liu, Xiang Liu, Mingyuan Liu, Ju Liu, Luyan Liu, Xavier Llado, Marc Moreno Lopez, Pablo Ribalta Lorenzo, Zhentai Lu, Lin Luo, Zhigang Luo, Jun Ma, Kai Ma, Thomas Mackie, Anant Madabushi, Issam Mahmoudi, Klaus H. Maier-Hein, Pradipta Maji, CP Mammen, Andreas Mang, B. S. Manjunath, Michal Marcinkiewicz, S McDonagh, Stephen McKenna, Richard McKinley, Miriam Mehl, Sachin Mehta, Raghav Mehta, Raphael Meier, Christoph Meinel, Dorit Merhof, Craig Meyer, Robert Miller, Sushmita Mitra, Aliasgar Moiyadi, David Molina-Garcia, Miguel A. B. Monteiro, Grzegorz Mrukwa, Andriy Myronenko, Jakub Nalepa, Thuyen Ngo, Dong Nie, Holly Ning, Chen Niu, Nicholas K Nuechterlein, Eric Oermann, Arlindo Oliveira, Diego D. C. Oliveira, Arnau Oliver, Alexander F. I. Osman, Yu-Nian Ou, Sebastien Ourselin, Nikos Paragios, Moo Sung Park, Brad Paschke, J. Gregory Pauloski, Kamlesh Pawar, Nick Pawlowski, Linmin Pei, Suting Peng, Silvio M. Pereira, Julian Perez-Beteta, Victor M. Perez-Garcia, Simon Pezold, Bao Pham, Ashish Phophalia, Gemma Piella, G. N. Pillai, Marie Piraud, Maxim Pisov, Anmol Popli, Michael P. Pound, Reza Pourreza, Prateek Prasanna, Vesna Prkovska, Tony P. Pridmore, Santi Puch, Élodie Puybareau, Buyue Qian, Xu Qiao, Martin Rajchl, Swapnil Rane, Michael Rebsamen, Hongliang Ren, Xuhua Ren, Karthik Revanuru, Mina Rezaei, Oliver Rippel, Luis Carlos Rivera, Charlotte Robert, Bruce Rosen, Daniel Rueckert, Mohammed Safwan, Mostafa Salem, Joaquim Salvi, Irina Sanchez, Irina Sánchez, Heitor M. Santos, Emmett Sartor, Dawid Schellingerhout, Klaudius Scheufele, Matthew R. Scott, Artur A. Scussel, Sara Sedlar, Juan Pablo Serrano-Rubio, N. Jon Shah, Nameetha Shah, Mazhar Shaikh, B. Uma Shankar, Zeina Shboul, Haipeng Shen, Dinggang Shen, Linlin Shen, Haocheng Shen, Varun Shenoy, Feng Shi, Hyung Eun Shin, Hai Shu, Diana Sima, M Sinclair, Orjan Smedby, James M. Snyder, Mohammadreza Soltaninejad, Guidong Song, Mehul Soni, Jean Stawiaski, Shashank Subramanian, Li Sun, Roger Sun, Jiawei Sun, Kay Sun, Yu Sun, Guoxia Sun, Shuang Sun, Yannick R Suter, Laszlo Szilagyi, Sanjay Talbar, Dacheng Tao, Zhongzhao Teng, Siddhesh Thakur, Meenakshi H Thakur, Sameer Tharakan, Pallavi Tiwari, Guillaume Tochon, Tuan Tran, Yuhsiang M. Tsai, Kuan-Lun Tseng, Tran Anh Tuan, Vadim Turlapov, Nicholas Tustison, Maria Vakalopoulou, Sergi Valverde, Rami Vanguri, Evgeny Vasiliev, Jonathan Ventura, Luis Vera, Tom Vercauteren, C. A. Verrastro, Lasitha Vidyaratne, Veronica Vilaplana, Ajeet Vivekanandan, Guotai Wang, Qian Wang, Chiatse J. Wang, Weichung Wang, Duo Wang, Ruixuan Wang, Yuanyuan Wang, Chunliang Wang, Ning Wen, Xin Wen, Leon Weninger, Wolfgang Wick, Shaocheng Wu, Qiang Wu, Yihong Wu, Yong Xia, Yanwu Xu, Xiaowen Xu, Peiyuan Xu, Tsai-Ling Yang, Xiaoping Yang, Hao-Yu Yang, Junlin Yang, Haojin Yang, Guang Yang, Hongdou Yao, Xujiong Ye, Changchang Yin, Brett Young-Moxon, Jinhua Yu, Xiangyu Yue, Songtao Zhang, Angela Zhang, Kun Zhang, Xuejie Zhang, Lichi Zhang, Xiaoyue Zhang, Yazhuo Zhang, Lei Zhang, Jianguo Zhang, Xiang Zhang, Tianhao Zhang, Sicheng Zhao, Yu Zhao, Xiaomei Zhao, Liang Zhao, Yefeng Zheng, Liming Zhong, Chenhong Zhou, Xiaobing Zhou, Fan Zhou, Hongtu Zhu, Jin Zhu, Ying Zhuge, Weiwei Zong, Jayashree Kalpathy-Cramer, Keyvan Farahani, Christos Davatzikos, Koen van Leemput, and Bjoern Menze. Identifying the BestMachineLearningAlgorithms for BrainTumorSegmentation, ProgressionAssessment, and OverallSurvivalPrediction in the BRATSChallenge, 2018. Version Number: 3.

[15]

Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, Luciano M. Prevedello, Jeffrey D. Rudie, Chiharu Sako, Russell T. Shinohara, Timothy Bergquist, Rong Chai, James Eddy, Julia Elliott, Walter Reade, Thomas Schaffter, Thomas Yu, Jiaxin Zheng, Ahmed W. Moawad, Luiz Otavio Coelho, Olivia McDonnell, Elka Miller, Fanny E. Moron, Mark C. Oswood, Robert Y. Shih, Loizos Siakallis, Yulia Bronstein, James R. Mason, Anthony F. Miller, Gagandeep Choudhary, Aanchal Agarwal, Cristina H. Besada, Jamal J. Derakhshan, Mariana C. Diogo, Daniel D. Do-Dai, Luciano Farage, John L. Go, Mohiuddin Hadi, Virginia B. Hill, Michael Iv, David Joyner, Christie Lincoln, Eyal Lotan, Asako Miyakoshi, Mariana Sanchez-Montano, Jaya Nath, Xuan V. Nguyen, Manal Nicolas-Jilwan, Johanna Ortiz Jimenez, Kerem Ozturk, Bojan D. Petrovic, Chintan Shah, Lubdha M. Shah, Manas Sharma, Onur Simsek, Achint K. Singh, Salil Soman, Volodymyr Statsevych, Brent D. Weinberg, Robert J. Young, Ichiro Ikuta, Amit K. Agarwal, Sword C. Cambron, Richard Silbergleit, Alexandru Dusoi, Alida A. Postma, Laurent Letourneau-Guillon, Gloria J. Guzman Perez-Carrillo, Atin Saha, Neetu Soni, Greg Zaharchuk, Vahe M. Zohrabian, Yingming Chen, Milos M. Cekic, Akm Rahman, Juan E. Small, Varun Sethi, Christos Davatzikos, John Mongan, Christopher Hess, Soonmee Cha, Javier Villanueva-Meyer, John B. Freymann, Justin S. Kirby, Benedikt Wiestler, Priscila Crivellaro, Rivka R. Colen, Aikaterini Kotrotsou, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Hassan Fathallah-Shaykh, Roland Wiest, Andras Jakab, Marc-Andre Weber, Abhishek Mahajan, Bjoern Menze, Adam E. Flanders, and Spyridon Bakas. The RSNA-ASNR-MICCAIBraTS 2021 Benchmark on BrainTumorSegmentation and RadiogenomicClassification, 2021. Version Number: 2.

[16]

Beatrice Bonato, Loris Nanni, and Alessandra Bertoldo. Advancing Precision: AComprehensiveReview of MRISegmentationDatasets from BraTSChallenges(2012–2025). Sensors, 25(6):1838, March 2025.

[17]

Arielle Dascal, Matthias Koepp, Jessica Royer, Judy Chen, Thaera Arafat, Lorenzo Caciagli, Neda Bernasconi, Robert Hopewell, Jean-Paul Soucy, Chris Hung-Hsin Hsiao, Raul Rodriguez-Cruces, Ella Sahlas, Andrea Bernasconi, Pedro Rosa-Neto, Raluca Pana, Sylvia Villeneuve, Boris C. Bernhardt, Jack Lam, and Gassan Massarweh. An open dataset of cerebral tau deposition in young healthy adults based on [18F]MK6240 positron emission tomography, 2025.

[18]

A Di Martino, C-G Yan, Q Li, E Denio, F X Castellanos, K Alaerts, J S Anderson, M Assaf, S Y Bookheimer, M Dapretto, B Deen, S Delmonte, I Dinstein, B Ertl-Wagner, D A Fair, L Gallagher, D P Kennedy, C L Keown, C Keysers, J E Lainhart, C Lord, B Luna, V Menon, N J Minshew, C S Monk, S Mueller, R-A Müller, M B Nebel, J T Nigg, K O’Hearn, K A Pelphrey, S J Peltier, J D Rudie, S Sunaert, M Thioux, J M Tyszka, L Q Uddin, J S Verhoeven, N Wenderoth, J L Wiggins, S H Mostofsky, and M P Milham. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular Psychiatry, 19(6):659–667, June 2014.

[19]

Adriana Di Martino, David O’Connor, Bosi Chen, Kaat Alaerts, Jeffrey S. Anderson, Michal Assaf, Joshua H. Balsters, Leslie Baxter, Anita Beggiato, Sylvie Bernaerts, Laura M. E. Blanken, Susan Y. Bookheimer, B. Blair Braden, Lisa Byrge, F. Xavier Castellanos, Mirella Dapretto, Richard Delorme, Damien A. Fair, Inna Fishman, Jacqueline Fitzgerald, Louise Gallagher, R. Joanne Jao Keehn, Daniel P. Kennedy, Janet E. Lainhart, Beatriz Luna, Stewart H. Mostofsky, Ralph-Axel Müller, Mary Beth Nebel, Joel T. Nigg, Kirsten O’Hearn, Marjorie Solomon, Roberto Toro, Chandan J. Vaidya, Nicole Wenderoth, Tonya White, R. Cameron Craddock, Catherine Lord, Bennett Leventhal, and Michael P. Milham. Enhancing studies of the connectome in autism using the autism brain imaging data exchange II. Scientific Data, 4(1):170010, March 2017.

[20]

Clifford R. Jack, Matt A. Bernstein, Nick C. Fox, Paul Thompson, Gene Alexander, Danielle Harvey, Bret Borowski, Paula J. Britson, Jennifer L. Whitwell, Chadwick Ward, Anders M. Dale, Joel P. Felmlee, Jeffrey L. Gunter, Derek L.G. Hill, Ron Killiany, Norbert Schuff, Sabrina Fox‐Bosetti, Chen Lin, Colin Studholme, Charles S. DeCarli, Gunnar Krueger, Heidi A. Ward, Gregory J. Metzger, Katherine T. Scott, Richard Mallozzi, Daniel Blezek, Joshua Levy, Josef P. Debbins, Adam S. Fleisher, Marilyn Albert, Robert Green, George Bartzokis, Gary Glover, John Mugler, and Michael W. Weiner. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging, 27(4):685–691, April 2008.

[21]

Clifford R. Jack, Arvin Arani, Bret J. Borowski, Dave M. Cash, Karen Crawford, Sandhitsu R. Das, Charles DeCarli, Evan Fletcher, Nick C. Fox, Jeffrey L. Gunter, Ranjit Ittyerah, Danielle J. Harvey, Neda Jahanshad, Pauline Maillard, Ian B. Malone, Talia M. Nir, Robert I. Reid, Denise A. Reyes, Christopher G. Schwarz, Matthew L. Senjem, David L. Thomas, Paul M. Thompson, Duygu Tosun, Paul A. Yushkevich, Chadwick P. Ward, Michael W. Weiner, and Alzheimer’s Disease Neuroimaging Initiative. Overview of ADNIMRI. Alzheimer’s & Dementia, 20(10):7350–7360, October 2024.

[22]

Lukas Snoek, Maite Van Der Miesen, Andries Van Der Leij, Tinka Beemsterboer, Annemarie Eigenhuis, and Steven Scholte. -ID1000, 2021.

[23]

Lukas Snoek, Maite M. Van Der Miesen, Tinka Beemsterboer, Andries Van Der Leij, Annemarie Eigenhuis, and H. Steven Scholte. The AmsterdamOpenMRICollection, a set of multimodal MRI datasets for individual difference analyses. Scientific Data, 8(1):85, March 2021.

[24]

Lukas Snoek, Maite Van Der Miesen, Andries Van Der Leij, Tinka Beemsterboer, Annemarie Eigenhuis, and Steven Scholte. -PIOP1, 2020.

[25]

Lukas Snoek, Maite Van Der Miesen, Andries Van Der Leij, Tinka Beemsterboer, Annemarie Eigenhuis, and Steven Scholte. -PIOP2, 2020.

[26]

Makayla Gibson, Roger Newman-Norlund, Leonardo Bonilha, Julius Fridriksson, Gregory Hickok, Argye E. Hillis, Dirk-Bart Den Ouden, and Chris Rorden. Aphasia RecoveryCohort(ARC) Dataset, 2023.

[27]

Makayla Gibson, Roger Newman-Norlund, Leonardo Bonilha, Julius Fridriksson, Gregory Hickok, Argye E. Hillis, Dirk-Bart Den Ouden, and Christopher Rorden. The AphasiaRecoveryCohort, an open-source chronic stroke repository. Scientific Data, 11(1):981, September 2024.

[28]

Sook-Lei Liew, Bethany P. Lo, Miranda R. Donnelly, Artemis Zavaliangos-Petropulu, Jessica N. Jeong, Giuseppe Barisano, Alexandre Hutton, Julia P. Simon, Julia M. Juliano, Anisha Suri, Zhizhuo Wang, Aisha Abdullah, Jun Kim, Tyler Ard, Nerisa Banaj, Michael R. Borich, Lara A. Boyd, Amy Brodtmann, Cathrin M. Buetefisch, Lei Cao, Jessica M. Cassidy, Valentina Ciullo, Adriana B. Conforto, Steven C. Cramer, Rosalia Dacosta-Aguayo, Ezequiel De La Rosa, Martin Domin, Adrienne N. Dula, Wuwei Feng, Alexandre R. Franco, Fatemeh Geranmayeh, Alexandre Gramfort, Chris M. Gregory, Colleen A. Hanlon, Brenton G. Hordacre, Steven A. Kautz, Mohamed Salah Khlif, Hosung Kim, Jan S. Kirschke, Jingchun Liu, Martin Lotze, Bradley J. MacIntosh, Maria Mataró, Feroze B. Mohamed, Jan E. Nordvik, Gilsoon Park, Amy Pienta, Fabrizio Piras, Shane M. Redman, Kate P. Revill, Mauricio Reyes, Andrew D. Robertson, Na Jin Seo, Surjo R. Soekadar, Gianfranco Spalletta, Alison Sweet, Maria Telenczuk, Gregory Thielman, Lars T. Westlye, Carolee J. Winstein, George F. Wittenberg, Kristin A. Wong, and Chunshui Yu. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms. Scientific Data, 9(1):320, June 2022.

[29]

William K. Lloyd, Jayne Morriss, Birthe Macdonald, Karin Joanknecht, Julie Nihouarn, and Carien M. Van Reekum. Emotion regulation in the AgeingBrain, University of Reading, BBSRC, 2021.

[30]

Endre Grøvik, Darvin Yi, Michael Iv, Elisabeth Tong, Daniel L. Rubin, and Greg Zaharchuk. , 2020.

[31]

Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D’Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Rongrong Chai, Verena Chung, Shahriar Faghani, Keyvan Farahani, Anahita Fathi Kazerooni, Eugenio Iglesias, Florian Kofler, Hongwei Li, Marius George Linguraru, Bjoern Menze, Ahmed W. Moawad, Yury Velichko, Benedikt Wiestler, Talissa Altes, Patil Basavasagar, Martin Bendszus, Gianluca Brugnara, Jaeyoung Cho, Yaseen Dhemesh, Brandon K. K. Fields, Filip Garrett, Jaime Gass, Lubomir Hadjiiski, Jona Hattangadi-Gluth, Christopher Hess, Jessica L. Houk, Edvin Isufi, Lester J. Layfield, George Mastorakos, John Mongan, Pierre Nedelec, Uyen Nguyen, Sebastian Oliva, Matthew W. Pease, Aditya Rastogi, Jason Sinclair, Robert X. Smith, Leo P. Sugrue, Jonathan Thacker, Igor Vidic, Javier Villanueva-Meyer, Nathan S. White, Mariam Aboian, Gian Marco Conte, Anders Dale, Mert R. Sabuncu, Tyler M. Seibert, Brent Weinberg, Aly Abayazeed, Raymond Huang, Sevcan Turk, Andreas M. Rauschecker, Nikdokht Farid, Philipp Vollmuth, Ayman Nada, Spyridon Bakas, Evan Calabrese, and Jeffrey D. Rudie. The 2024 BrainTumorSegmentation(BraTS) Challenge: GliomaSegmentation on Post-treatment MRI, May 2024. arXiv:2405.18368 [cs].

[32]

Dominic LaBella, Maruf Adewole, Michelle Alonso-Basanta, Talissa Altes, Syed Muhammad Anwar, Ujjwal Baid, Timothy Bergquist, Radhika Bhalerao, Sully Chen, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Devon Godfrey, Fathi Hilal, Ariana Familiar, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang, Elaine Johanson, Anahita Fathi Kazerooni, Collin Kent, John Kirkpatrick, Florian Kofler, Koen Van Leemput, Hongwei Bran Li, Xinyang Liu, Aria Mahtabfar, Shan McBurney-Lin, Ryan McLean, Zeke Meier, Ahmed W Moawad, John Mongan, Pierre Nedelec, Maxence Pajot, Marie Piraud, Arif Rashid, Zachary Reitman, Russell Takeshi Shinohara, Yury Velichko, Chunhao Wang, Pranav Warman, Walter Wiggins, Mariam Aboian, Jake Albrecht, Udunna Anazodo, Spyridon Bakas, Adam Flanders, Anastasia Janas, Goldey Khanna, Marius George Linguraru, Bjoern Menze, Ayman Nada, Andreas M Rauschecker, Jeff Rudie, Nourel Hoda Tahon, Javier Villanueva-Meyer, Benedikt Wiestler, and Evan Calabrese. The ASNR-MICCAIBrainTumorSegmentation(BraTS) Challenge 2023: IntracranialMeningioma, 2023. Version Number: 1.

[33]

Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie, Rachit Saluja, Yury Velichko, Chunhao Wang, Pranav Warman, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Syed Muhammad Anwar, Timothy Bergquist, Sully Francis Chen, Verena Chung, Rong Chai, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Nastaran Khalili, Juan Eugenio Iglesias, Zhifan Jiang, Elaine Johanson, Koen Van Leemput, Hongwei Bran Li, Marius George Linguraru, Xinyang Liu, Aria Mahtabfar, Zeke Meier, Ahmed W. Moawad, John Mongan, Marie Piraud, Russell Takeshi Shinohara, Walter F. Wiggins, Aly H. Abayazeed, Rachel Akinola, András Jakab, Michel Bilello, Maria Correia de Verdier, Priscila Crivellaro, Christos Davatzikos, Keyvan Farahani, John Freymann, Christopher Hess, Raymond Huang, Philipp Lohmann, Mana Moassefi, Matthew W. Pease, Phillipp Vollmuth, Nico Sollmann, David Diffley, Khanak K. Nandolia, Daniel I. Warren, Ali Hussain, Pascal Fehringer, Yulia Bronstein, Lisa Deptula, Evan G. Stein, Mahsa Taherzadeh, Eduardo Portela de Oliveira, Aoife Haughey, Marinos Kontzialis, Luca Saba, Benjamin Turner, Melanie M. T. Brüßeler, Shehbaz Ansari, Athanasios Gkampenis, David Maximilian Weiss, Aya Mansour, Islam H. Shawali, Nikolay Yordanov, Joel M. Stein, Roula Hourani, Mohammed Yahya Moshebah, Ahmed Magdy Abouelatta, Tanvir Rizvi, Klara Willms, Dann C. Martin, Abdullah Okar, Gennaro D’Anna, Ahmed Taha, Yasaman Sharifi, Shahriar Faghani, Dominic Kite, Marco Pinho, Muhammad Ammar Haider, Alejandro Aristizabal, Alexandros Karargyris, Hasan Kassem, Sarthak Pati, Micah Sheller, Michelle Alonso-Basanta, Javier Villanueva-Meyer, Andreas M. Rauschecker, Ayman Nada, Mariam Aboian, Adam E. Flanders, Benedikt Wiestler, Spyridon Bakas, and Evan Calabrese. Analysis of the BraTS 2023 IntracranialMeningiomaSegmentationChallenge, 2024. Publisher: arXiv Version Number: 2.

[34]

Dominic LaBella, Omaditya Khanna, Shan McBurney-Lin, Ryan Mclean, Pierre Nedelec, Arif S. Rashid, Nourel Hoda Tahon, Talissa Altes, Ujjwal Baid, Radhika Bhalerao, Yaseen Dhemesh, Scott Floyd, Devon Godfrey, Fathi Hilal, Anastasia Janas, Anahita Kazerooni, Collin Kent, John Kirkpatrick, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie, Rachit Saluja, Yury Velichko, Chunhao Wang, Pranav I. Warman, Nico Sollmann, David Diffley, Khanak K. Nandolia, Daniel I Warren, Ali Hussain, John Pascal Fehringer, Yulia Bronstein, Lisa Deptula, Evan G. Stein, Mahsa Taherzadeh, Eduardo Portela De Oliveira, Aoife Haughey, Marinos Kontzialis, Luca Saba, Benjamin Turner, Melanie M. T. Brüßeler, Shehbaz Ansari, Athanasios Gkampenis, David Maximilian Weiss, Aya Mansour, Islam H. Shawali, Nikolay Yordanov, Joel M. Stein, Roula Hourani, Mohammed Yahya Moshebah, Ahmed Magdy Abouelatta, Tanvir Rizvi, Klara Willms, Dann C. Martin, Abdullah Okar, Gennaro D’Anna, Ahmed Taha, Yasaman Sharifi, Shahriar Faghani, Dominic Kite, Marco Pinho, Muhammad Ammar Haider, Michelle Alonso-Basanta, Javier Villanueva-Meyer, Andreas M. Rauschecker, Ayman Nada, Mariam Aboian, Adam Flanders, Spyridon Bakas, and Evan Calabrese. A multi-institutional meningioma MRI dataset for automated multi-sequence image segmentation. Scientific Data, 11(1):496, May 2024.

[35]

Nazanin Maleki, Raisa Amiruddin, Ahmed W. Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, Justin Cramer, Mark Krycia, Elizabeth B. Shrickel, Ichiro Ikuta, Gerard Thompson, Lorenna Vidal, Vilma Kosovic, Adam E. Goldman-Yassen, Virginia Hill, Tiffany So, Sedra Mhana, Albara Alotaibi, Nathan Page, Prisha Bhatia, Yasaman Sharifi, Marko Jakovljevic, Salma Abosabie, Sara Abosabie, Mohanad Ghonim, Mohamed Ghonim, Amirreza Manteghinejad, Anastasia Janas, Kiril Krantchev, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Sanjay Aneja, Syed Muhammad Anwar, Timothy Bergquist, Veronica Chiang, Verena Chung, Gian Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Nastaran Khalili, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang, Elaine Johanson, Anahita Fathi Kazerooni, Florian Kofler, Dominic LaBella, Koen Van Leemput, Hongwei Bran Li, Marius George Linguraru, Xinyang Liu, Zeke Meier, Bjoern H Menze, Harrison Moy, Klara Osenberg, Marie Piraud, Zachary Reitman, Russell Takeshi Shinohara, Chunhao Wang, Benedikt Wiestler, Walter Wiggins, Umber Shafique, Klara Willms, Arman Avesta, Khaled Bousabarah, Satrajit Chakrabarty, Nicolo Gennaro, Wolfgang Holler, Manpreet Kaur, Pamela LaMontagne, MingDe Lin, Jan Lost, Daniel S. Marcus, Ryan Maresca, Sarah Merkaj, Gabriel Cassinelli Pedersen, Marc von Reppert, Aristeidis Sotiras, Oleg Teytelboym, Niklas Tillmans, Malte Westerhoff, Ayda Youssef, Devon Godfrey, Scott Floyd, Andreas Rauschecker, Javier Villanueva-Meyer, Irada Pflüger, Jaeyoung Cho, Martin Bendszus, Gianluca Brugnara, Gloria J. Guzman Perez-Carillo, Derek R. Johnson, Anthony Kam, Benjamin Yin Ming Kwan, Lillian Lai, Neil U. Lall, Satya Narayana Patro, Lei Wu, Anu Bansal, Frederik Barkhof, Cristina Besada, Sammy Chu, Jason Druzgal, Alexandru Dusoi, Luciano Farage, Fabricio Feltrin, Amy Fong, Steve H. Fung, R. Ian Gray, Michael Iv, Alida A. Postma, Amit Mahajan, David Joyner, Chase Krumpelman, Laurent Letourneau-Guillon, Christie M. Lincoln, Mate E. Maros, Elka Miller, Fanny Morón, Esther A. Nimchinsky, Ozkan Ozsarlak, Uresh Patel, Saurabh Rohatgi, Atin Saha, Anousheh Sayah, Eric D. Schwartz, Robert Shih, Mark S. Shiroishi, Juan E. Small, Manoj Tanwar, Jewels Valerie, Brent D. Weinberg, Matthew L. White, Robert Young, Vahe M. Zohrabian, Aynur Azizova, Melanie Maria Theresa Brüßeler, Abdullah Okar, Luca Pasquini, Gagandeep Singh, Nico Sollmann, Theodora Soumala, Mahsa Taherzadeh, Philipp Vollmuth, Martha Foltyn-Dumitru, Ajay Malhotra, Francesco Dellepiane, Víctor M. Pérez-García, Hesham Elhalawani, Maria Correia de Verdier, Sanaria Al Rubaiey, Rui Duarte Armindo, Kholod Ashraf, Moamen M. Asla, Mohamed Badawy, Jeroen Bisschop, Nima Broomand Lomer, Jan Bukatz, Jim Chen, Petra Cimflova, Felix Corr, Alexis Crawley, Lisa Deptula, Tasneem Elakhdar, Islam H. Shawali, Shahriar Faghani, Alexandra Frick, Vaibhav Gulati, Muhammad Ammar Haider, Fátima Hierro, Rasmus Holmboe Dahl, Sarah Maria Jacobs, Kuang-chun Jim Hsieh, Sedat G. Kandemirli, Katharina Kersting, Laura Kida, Sofia Kollia, Ioannis Koukoulithras, Xiao Li, Ahmed Abouelatta, Aya Mansour, Ruxandra-Catrinel Maria-Zamfirescu, Marcela Marsiglia, Yohana Sarahi Mateo-Camacho, Mark McArthur, Olivia McDonnel, Maire McHugh, Mana Moassefi, Samah Mostafa Morsi, Alexander Munteanu, Khanak K. Nandolia, Syed Raza Naqvi, Yalda Nikanpour, Mostafa Alnoury, Abdullah Mohamed Aly Nouh, Francesca Pappafava, Markand D. Patel, Samantha Petrucci, Eric Rawie, Scott Raymond, Borna Roohani, Sadeq Sabouhi, Laura M. Sanchez Garcia, Zoe Shaked, Pokhraj P. Suthar, Talissa Altes, Edvin Isufi, Yaseen Dhemesh, Jaime Gass, Jonathan Thacker, Abdul Rahman Tarabishy, Benjamin Turner, Sebastiano Vacca, George K. Vilanilam, Daniel Warren, David Weiss, Fikadu Worede, Sara Yousry, Wondwossen Lerebo, Alejandro Aristizabal, Alexandros Karargyris, Hasan Kassem, Sarthak Pati, Micah Sheller, Katherine E. Link, Evan Calabrese, Nourel Hoda Tahon, Ayman Nada, Jeffrey D. Rudie, Janet Reid, Kassa Darge, Aly H. Abayazeed, Philipp Lohmann, Yuri S. Velichko, Spyridon Bakas, and Mariam Aboian. Analysis of the MICCAIBrainTumorSegmentation – Metastases(BraTS-METS) 2025 LighthouseChallenge: BrainMetastasisSegmentation on Pre- and Post-treatment MRI, 2025. Version Number: 2.

[36]

Maruf Adewole, Jeffrey D. Rudie, Anu Gbadamosi, Oluyemisi Toyobo, Confidence Raymond, Dong Zhang, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, Nancy Ojo, Kenneth Aguh, Chinasa Kalaiwo, Gabriel Babatunde, Afolabi Ogunleye, Yewande Gbadamosi, Kator Iorpagher, Evan Calabrese, Mariam Aboian, Marius Linguraru, Jake Albrecht, Benedikt Wiestler, Florian Kofler, Anastasia Janas, Dominic LaBella, Anahita Fathi Kzerooni, Hongwei Bran Li, Juan Eugenio Iglesias, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Ariana Familiar, Koen Van Leemput, Christina Bukas, Maire Piraud, Gian-Marco Conte, Elaine Johansson, Zeke Meier, Bjoern H Menze, Ujjwal Baid, Spyridon Bakas, Farouk Dako, Abiodun Fatade, and Udunna C Anazodo. The BrainTumorSegmentation(BraTS) Challenge 2023: GliomaSegmentation in Sub-SaharanAfricaPatientPopulation(BraTS-Africa), 2023. Version Number: 1.

[37]

Maruf Adewole, Jeffrey D. Rudie, Anu Gbadamosi, Dong Zhang, Confidence Raymond, James Ajigbotoshso, Oluyemisi Toyobo, Kenneth Aguh, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, Nancy Ojo, Chinasa Kalaiwo, Gabriel Babatunde, Afolabi Ogunleye, Yewande Gbadamosi, Kator Iorpagher, Mayomi Onuwaje, Bamidele Betiku, Jasmine Cakmak, Björn Menze, Ujjwal Baid, Spyridon Bakas, Farouk Dako, Abiodun Fatade, and Udunna C. Anazodo. The BraTS-AfricaDataset: Expanding the BrainTumorSegmentation(BraTS) Data to CaptureAfricanPopulations. Radiology: Artificial Intelligence, page e240528, April 2025.

[38]

Roberto Souza, Oeslle Lucena, Julia Garrafa, David Gobbi, Marina Saluzzi, Simone Appenzeller, Letícia Rittner, Richard Frayne, and Roberto Lotufo. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement. NeuroImage, 170:482–494, April 2018.

[39]

Denise Park, Joseph Hennessee, Evan T. Smith, Micaela Chan, Carson Katen, Gagan Wig, Karan Rodrigue, and Kristen Kennedy. The DallasLifespanBrainStudy, 2024.

[40]

Antonella Castellano, Valentina Pieri, Stefano Galvan, Antonella Iadanza, Marco Riva, Lorenzo Bello, Ferdinando Rodriguez y Baena, and Andrea Falini. for BrainGliomaPatients, December 2020.

[41]

Fernando Pérez-García, Roman Rodionov, Ali Alim-Marvasti, Rachel Sparks, John Duncan, and Sebastien Ourselin. : a dataset of postoperative magnetic resonance images (MRI) for quantitative analysis of resection neurosurgery for refractory epilepsy, 2020.

[42]

Randy L. Buckner, Joshua L. Roffman, Jordan W. Smoller, and Neuroinformatics Research Group. Brain GenomicsSuperstructProject(GSP), May 2014.

[43]

David O’Connor, Natan Vega Potler, Meagan Kovacs, Ting Xu, Lei Ai, John Pellman, Tamara Vanderwal, Lucas C. Parra, Samantha Cohen, Satrajit Ghosh, Jasmine Escalera, Natalie Grant-Villegas, Yael Osman, Anastasia Bui, R. Cameron Craddock, and Michael P. Milham. The HealthyBrainNetworkSerialScanningInitiative: a resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions. GigaScience, 6(2):giw011, February 2017.

[44]

David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E.J. Behrens, Essa Yacoub, and Kamil Ugurbil. The WU-MinnHumanConnectomeProject: An overview. NeuroImage, 80:62–79, October 2013.

[45]

Shao-Lun Lu, Heng-Chun Liao, Feng-Ming Hsu, Chun-Chih Liao, Feipei Lai, and Furen Xiao. The intracranial tumor segmentation challenge: Contour tumors on brain MRI for radiosurgery. NeuroImage, 244:118585, December 2021.

[46]

Inés Mérida, Julien Jung, Sandrine Bouvard, Didier Le Bars, Sophie Lancelot, Franck Lavenne, Caroline Bouillot, Jérôme Redouté, Alexander Hammers, and Nicolas Costes. -IDB-MRXFDG: a database of 37 normal adult human brain [18F]FDGPET, T1 and FLAIRMRI, and CT images available for research. EJNMMI Research, 11(1):91, December 2021.

[47]

Peter N. Taylor, Yujiang Wang, Callum Simpson, Vytene Janiukstyte, Jonathan Horsley, Karoline Leiberg, Beth Little, Harry Clifford, Sophie Adler, Sjoerd B. Vos, Gavin P Winston, Andrew W McEvoy, Anna Miserocchi, Jane De Tisi, and John S Duncan. The ImagingDatabase for EpilepsyAndSurgery(IDEAS), 2024.

[48]

Moritz R. Hernandez Petzsche, Ezequiel De La Rosa, Uta Hanning, Roland Wiest, Waldo Valenzuela, Mauricio Reyes, Maria Meyer, Sook-Lei Liew, Florian Kofler, Ivan Ezhov, David Robben, Alexandre Hutton, Tassilo Friedrich, Teresa Zarth, Johannes Bürkle, The Anh Baran, Björn Menze, Gabriel Broocks, Lukas Meyer, Claus Zimmer, Tobias Boeckh-Behrens, Maria Berndt, Benno Ikenberg, Benedikt Wiestler, and Jan S. Kirschke. 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Scientific Data, 9(1):762, December 2022.

[49]

IXI Dataset.

[50]

David Seminowicz, Shana Burrowes, Alexandra Kearson, Jing Zhang, Samuel Krimmel, Luma Samawi, Andrew Furman, and Michael Keaser. , 2024.

[51]

April Vassantachart, Yufeng Cao, Zhilei Shen, Karen Cheng, Michael Gribble, Jason Chao Ye, Grabriel Zada, Kyle Hurth, Anna Mathew, Samuel Guzman, and Wensha Yang. Segmentation and Classification of GradeI and IIMeningiomas from MagneticResonanceImaging: AnOpenAnnotatedDataset(Meningioma-SEG-CLASS), 2023.

[52]

Juan E. Iglesias, Benjamin Billot, Yaël Balbastre, Colin Magdamo, Steven E. Arnold, Sudeshna Das, Brian L. Edlow, Daniel C. Alexander, Polina Golland, and Bruce Fischl. : A public AI tool to turn heterogeneous clinical brain scans into high-resolution T1-weighted images for 3D morphometry. Science Advances, 9(5):eadd3607, February 2023.

[53]

Jessica Royer, Raúl Rodríguez-Cruces, Shahin Tavakol, Sara Larivière, Peer Herholz, Qiongling Li, Reinder Vos De Wael, Casey Paquola, Oualid Benkarim, Bo-yong Park, Alexander J. Lowe, Daniel Margulies, Jonathan Smallwood, Andrea Bernasconi, Neda Bernasconi, Birgit Frauscher, and Boris C. Bernhardt. An OpenMRIDatasetForMultiscaleNeuroscience. Scientific Data, 9(1):569, September 2022.

[54]

Zhenyu Gong, Tao Xu, Nan Peng, Xing Cheng, Chen Niu, Benedikt Wiestler, Fan Hong, and Bran ei Bran Li. A Multi-Center, Multi-ParametricMRIDataset of Primary and SecondaryBrainTumors, December 2023.

[55]

Ali M Muslim. Brain MRIDataset of MultipleSclerosis with ConsensusManualLesionSegmentation and PatientMetaInformation, March 2022.

[56]

Francesco Guarnera, Alessia Rondinella, Elena Crispino, Giulia Russo, Clara Di Lorenzo, Davide Maimone, Francesco Pappalardo, and Sebastiano Battiato. : baseline and benchmarking of a new MultipleSclerosisLesionSegmentation dataset. Scientific Data, 12(1):920, May 2025.

[57]

MICCAI 2021. Longitudinal MultipleSclerosisLesionSegmentationChallenge.

[58]

Francesco Pappalardo, Giulia Russo, Valentina Di Salvatore, Sebastiano Battiato, Francesco Guarnera, and Alessia Rondinella. , March 2024.

[59]

Benjamin Puccio, James P Pooley, John S Pellman, Elise C Taverna, and R Cameron Craddock. The preprocessed connectomes project repository of manually corrected skull-stripped T1-weighted anatomical MRI data. Gigascience, 5(1):s13742–016–0150–5, December 2016.

[60]

Jennifer W. Evans, Allison C. Nugent, and Carlos A. Zarate. of ActionStudy, 2025.

[61]

Allison C. Nugent, Adam G Thomas, Margaret Mahoney, Alison Gibbons, Jarrod Smith, Antoinette Charles, Jacob S Shaw, Jeffrey D Stout, Anna M Namyst, Arshitha Basavaraj, Eric Earl, Dustin Moraczewski, Emily Guinee, Michael Liu, Travis Riddle, Joseph Snow, Shruti Japee, Morgan Andrews, Adriana Pavletic, Stephen Sinclair, Vinai Roopchansingh, Peter A Bandettini, and Joyce Chung. The NIMHHealthyResearchVolunteerDataset, 2025.

[62]

Allison C. Nugent, Adam G. Thomas, Margaret Mahoney, Alison Gibbons, Jarrod T. Smith, Antoinette J. Charles, Jacob S. Shaw, Jeffrey D. Stout, Anna M. Namyst, Arshitha Basavaraj, Eric Earl, Travis Riddle, Joseph Snow, Shruti Japee, Adriana J. Pavletic, Stephen Sinclair, Vinai Roopchansingh, Peter A. Bandettini, and Joyce Chung. The NIMH intramural healthy volunteer dataset: A comprehensive MEG, MRI, and behavioral resource. Scientific Data, 9(1), August 2022. Publisher: Springer Science and Business Media LLC.

[63]

Elena Filimonova, Anton Pashkov, Norayr Borisov, Anton Kalinovsky, and Jamil Rzaev. Utilizing the amide proton transfer technique to characterize diffuse gliomas based on the WHO 2021 classification of CNS tumors. The Neuroradiology Journal, 37(4):490–499, August 2024.

[64]

Daniel S. Marcus, Tracy H. Wang, Jamie Parker, John G. Csernansky, John C. Morris, and Randy L. Buckner. Open AccessSeries of ImagingStudies(OASIS): Cross-sectional MRIData in Young, MiddleAged, Nondemented, and DementedOlderAdults. Journal of Cognitive Neuroscience, 19(9):1498–1507, September 2007.

[65]

Daniel S. Marcus, Anthony F. Fotenos, John G. Csernansky, John C. Morris, and Randy L. Buckner. Open AccessSeries of ImagingStudies: LongitudinalMRIData in Nondemented and DementedOlderAdults. Journal of Cognitive Neuroscience, 22(12):2677–2684, December 2010.

[66]

Kenneth Marek, Sohini Chowdhury, Andrew Siderowf, Shirley Lasch, Christopher S. Coffey, Chelsea Caspell‐Garcia, Tanya Simuni, Danna Jennings, Caroline M. Tanner, John Q. Trojanowski, Leslie M. Shaw, John Seibyl, Norbert Schuff, Andrew Singleton, Karl Kieburtz, Arthur W. Toga, Brit Mollenhauer, Doug Galasko, Lana M. Chahine, Daniel Weintraub, Tatiana Foroud, Duygu Tosun‐Turgut, Kathleen Poston, Vanessa Arnedo, Mark Frasier, Todd Sherer, and the Parkinson’s Progression Markers Initiative. The Parkinson’s progression markers initiative (PPMI) – establishing a PD biomarker cohort. Annals of Clinical and Translational Neurology, 5(12):1460–1477, December 2018.

[67]

Kathleen M. Schmainda, Melissa A. Prah, Jennifer M. Connelly, and Scott D. Rand. Glioma DSC-MRIPerfusionData with StandardImaging and ROIs(QIN-BRAIN-DSC-MRI), 2016.

[68]

Parikshit Juvekar, Reuben Dorent, Fryderyk Kögl, Erickson Torio, Colton Barr, Laura Rigolo, Colin Galvin, Nick Jowkar, Anees Kazi, Nazim Haouchine, Harneet Cheema, Nassir Navab, Steve Pieper, William M. Wells, Wenya Linda Bi, Alexandra Golby, Sarah Frisken, and Tina Kapur. The BrainResectionMultimodalImagingDatabase(ReMIND), 2023.

[69]

Chris Rorden, John Absher, and Roger Newman-Norlund. Stroke OutcomeOptimizationProject(SOOP), 2024.

[70]

R Bilder, R Poldrack, T Cannon, E London, N Freimer, E Congdon, K Karlsgodt, and F Sabb. for NeuropsychiatricPhenomicsLA5cStudy, 2020.

[71]

R.A. Poldrack, E. Congdon, W. Triplett, K.J. Gorgolewski, K.H. Karlsgodt, J.A. Mumford, F.W. Sabb, N.B. Freimer, E.D. London, T.D. Cannon, and R.M. Bilder. A phenome-wide examination of neural and cognitive function. Scientific Data, 3(1):160110, December 2016.

[72]

Brandon K. K. Fields, Evan Calabrese, John Mongan, Soonmee Cha, Christopher P. Hess, Leo P. Sugrue, Susan M. Chang, Tracy L. Luks, Javier E. Villanueva-Meyer, Andreas M. Rauschecker, and Jeffrey D. Rudie. The University of CaliforniaSanFranciscoAdultLongitudinalPost-TreatmentDiffuseGliomaMRIDataset. Radiology: Artificial Intelligence, 6(4), July 2024. Publisher: Radiological Society of North America (RSNA).

[73]

Jeffrey D. Rudie, Rachit Saluja, David A. Weiss, Pierre Nedelec, Evan Calabrese, John B. Colby, Benjamin Laguna, John Mongan, Steve Braunstein, Christopher P. Hess, Andreas M. Rauschecker, Leo P. Sugrue, and Javier E. Villanueva-Meyer. The University of CaliforniaSanFranciscoBrainMetastasesStereotacticRadiosurgery(UCSF-BMSR) MRIDataset. Radiology: Artificial Intelligence, 6(2):e230126, March 2024.

[74]

Evan Calabrese, Javier Villanueva-Meyer, Jeffrey Rudie, Andreas Rauschecker, Ujjwal Baid, Spyridon Bakas, Soonmee Cha, John Mongan, and Christopher Hess. The University of CaliforniaSanFranciscoPreoperativeDiffuseGliomaMRI(UCSF-PDGM), April 2023.

[75]

Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, Bette Liu, Paul Matthews, Giok Ong, Jill Pell, Alan Silman, Alan Young, Tim Sprosen, Tim Peakman, and Rory Collins. : AnOpenAccessResource for Identifying the Causes of a WideRange of ComplexDiseases of Middle and OldAge. PLOS Medicine, 12(3):e1001779, March 2015.

[76]

Liviu Badea, Mihaela Onu, Tao Wu, Adina Roceanu, and Ovidiu Bajenaru. Exploring the reproducibility of functional connectivity alterations in Parkinson’s disease. PLOS ONE, 12(11):e0188196, November 2017.

[77]

Spyridon Bakas, Chiharu Sako, Hamed Akbari, Michel Bilello, Aristeidis Sotiras, Gaurav Shukla, Jeffrey D. Rudie, Natali Flores Santamaria, Anahita Fathi Kazerooni, Sarthak Pati, Saima Rathore, Elizabeth Mamourian, Sung Min Ha, William Parker, Jimit Doshi, Ujjwal Baid, Mark Bergman, Zev A. Binder, Ragini Verma, Robert Lustig, Arati Desai, Stephen Bagley, Zissimos Mourelatos, Jennifer Morrissette, Christopher Watt, Steven Brem, Ronald Wolf, MacLean P. Nasrallah, Suyash Mohan, Donald M. O’Rourke, and Christos Davatzikos. Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma(GBM) patients from the University of PennsylvaniaHealthSystem(UPENN-GBM), 2021.

[78]

Spyridon Bakas, Chiharu Sako, Hamed Akbari, Michel Bilello, Aristeidis Sotiras, Gaurav Shukla, Jeffrey D. Rudie, Natali Flores Santamaría, Anahita Fathi Kazerooni, Sarthak Pati, Saima Rathore, Elizabeth Mamourian, Sung Min Ha, William Parker, Jimit Doshi, Ujjwal Baid, Mark Bergman, Zev A. Binder, Ragini Verma, Robert A. Lustig, Arati S. Desai, Stephen J. Bagley, Zissimos Mourelatos, Jennifer Morrissette, Christopher D. Watt, Steven Brem, Ronald L. Wolf, Elias R. Melhem, MacLean P. Nasrallah, Suyash Mohan, Donald M. O’Rourke, and Christos Davatzikos. The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics. Scientific Data, 9(1):453, July 2022.

[79]

Hugo Kuijf, Matthijs Biesbroek, Jeroen de Bresser, Rutger Heinen, Christopher Chen, Wiesje van der Flier, Barkhof, Max Viergever, and Geert Jan Biessels. Data of the WhiteMatterHyperintensity(WMH) SegmentationChallenge, 2022.

[80]

Stephen M. Smith. Fast robust automated brain extraction. Human Brain Mapping, 17(3):143–155, November 2002.

[81]

Robert W. Cox. : Software for Analysis and Visualization of FunctionalMagneticResonanceNeuroimages. Computers and Biomedical Research, 29(3):162–173, June 1996.

[82]

Fabian Isensee, Marianne Schell, Irada Pflueger, Gianluca Brugnara, David Bonekamp, Ulf Neuberger, Antje Wick, Heinz‐Peter Schlemmer, Sabine Heiland, Wolfgang Wick, Martin Bendszus, Klaus H. Maier‐Hein, and Philipp Kickingereder. Automated brain extraction of multisequence MRI using artificial neural networks. Human Brain Mapping, 40(17):4952–4964, December 2019.

A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development