CNNP Lab (www.cnnp-lab.com), School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom
Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
UCL Queen Square Institute of Neurology, Queen Square, London, United Kingdom

* Yujiang.Wang@newcastle.ac.uk

Abstract↩︎

Normative modelling is an increasingly common statistical technique in neuroimaging that estimates population-level benchmarks in brain structure. It enables the quantification of individual deviations from expected distributions whilst accounting for biological and technical covariates without requiring large, matched control groups. This makes it a powerful alternative to traditional case-control studies for identifying brain structural alterations associated with pathology. Despite the availability of numerous modelling approaches and several toolboxes with pretrained models, the distinct strengths and limitations of normative modelling make it difficult to determine how and when to implement them appropriately. This review offers practical guidance and outlines statistical considerations for clinical researchers using normative modelling in neuroimaging. We compare several open-source normative modelling tools through a worked example using clinical epilepsy data; outlining decision points, common pitfalls, and considerations for responsible implementation, to support broader and more rigorous adoption of normative modelling in neuroimaging research.

Introduction↩︎

The shape of the human brain can be quantified by structural neuroimaging. Deviations from typical morphology are often associated with neurological disorders, making accurate detection essential. Normative modelling is a statistical approach that establishes reference distributions of brain metrics such as cortical thickness, surface area, and volume based on the healthy population. Applying these models to new data yields deviation scores, for example z-scores or centiles, that quantify how an individual diverges from the healthy baseline 1. A familiar analogy is paediatric growth charts: just as height centiles assess a child’s growth relative to peers, normative curves evaluate brain metrics against age- and sex-matched standards [1].

Normative modelling removes the need for large, demographically matched control groups, making it increasingly popular in clinical research across different neurological disorders. In epilepsy, it can help localise structural abnormalities and predict treatment outcomes [2]. Similarly, the approach can characterise patient heterogeneity in mental disorders [3]–[5] and traumatic brain injury [6], [7]. It has proved valuable in studies of mild cognitive impairment and dementia [8]–[12], developmental psychiatry [13], [14], schizophrenia [15], [16], ADHD [17], and autism spectrum disorder [18]–[20]. The recent growth of pretrained models and user-friendly online platforms offers a powerful alternative to traditional case–control analyses.

Normative models use large healthy cohorts to learn the distribution of neuroimaging features as a function of covariates such as age, sex, and scanner site. The training dataset generally includes data acquired across multiple sites, and covers a wide age range. Typical statistical approaches to fit these models include linear regression, Gaussian process regression, generalised additive models of location, scale, and shape, and Bayesian frameworks [21]–[23]. Unlike case-control studies with restricted control sample sizes, normative models can therefore infer complex, non-Gaussian distributions of morphological features, which may more accurately reflect the distribution in the healthy population. Once trained, a model predicts expected values for new participants. Comparing observed metrics to these predictions yields individual deviation scores that account for biological (e.g. age, sex) and technical (e.g. scanner site [24]) variability. Calibrating pretrained models to new scanners requires a relatively small reference set of healthy controls from that site to estimate and correct site-specific effects. Without calibration, scanner artifacts may confound true pathological deviations.

Figure 1: Conceptual illustration of normative modelling The distribution of brain measures (e.g. cortical thickness, surface area, or volume) is inferred from a healthy reference cohort (blue figures) as a function of covariates such as age and sex. An individual (red figure) can then be compared against this normative distribution, and their deviation score (e.g. z-scores or centiles) indicates whether their brain measures fall within, above, or below the expected range. — Figure 1: **Conceptual illustration of normative modelling** The distribution of brain measures (e.g. cortical thickness, surface area, or volume) is inferred from a healthy reference cohort (blue figures) as a function of covariates such as age and sex. An individual (red figure) can then be compared against this normative distribution, and their deviation score (e.g. z-scores or centiles) indicates whether their brain measures fall within, above, or below the expected range.

In recent years, several open-access platforms of pretrained normative models for brain morphology have been developed, including Brain MoNoCle [25], BrainChart [26], PCN Toolkit [27], and CentileBrain [22]. These web-based platforms let users leverage the power of normative modelling by uploading processed imaging data and obtain deviation scores without needing to process training data or build normative models themselves. Although valuable, the complexities and distinct limitations of normative modelling may make it difficult to use for clinical researchers without extensive statistical training. Clinical researchers may struggle to choose between these models or platforms, determine how many calibration controls they need, and understand the consequences of using mismatched— or no—controls. Clear, practical guidelines on the proper and effective application of these methods to clinical data are lacking.

In this review, we provide practical guidance for applying normative modelling in clinical neuroimaging. We detail statistical considerations, highlight common pitfalls, and illustrate each point with a worked example. Specifically, we address:

How does the choice of normative model or platform influence results?
How many healthy controls are needed to calibrate a new scanner site?
What impact arises from demographic mismatches between cases and controls?
Can deviation scores be reliably computed without any site-matched controls for calibration?

It is our hope that this review will empower clinical researchers to adopt normative modelling techniques with confidence, enhancing the sensitivity and reproducibility of neuroimaging studies.

How does the choice of normative model or platform influence results?↩︎

The growing availability of large-scale neuroimaging datasets, combined with advances in statistical techniques, has led to multiple pretrained normative models of brain morphology, including Brain MoNoCle, BrainChart, PCN Toolkit, and CentileBrain. These platforms differ in model type (e.g. Generalized Additive Models for Location Scale and Shape (GAMLSS), Bayesian Linear Regression (BLR), Multivariate Fractional Polynomial Regression (MFPR), and Hierarchical Bayesian Regression (HBR)), underlying reference datasets, morphometric measures, and output formats. Despite their increasing adoption, the impact of model choice on downstream results is not fully established. From the model training side, the choice of algorithm and parameters can affect training efficiency and performance [22]. However, here, we want to test if different pretrained models differed in terms of end-user performance. To illustrate potential differences of pre-trained modelling platforms, we analysed a clinical dataset of patients with mesial temporal lobe epilepsy (mTLE) from the IDEAS study [28], using four normative tools: Brain MoNoCle, CentileBrain, PCN Toolkit, and BrainChart. Analyses focused on average cortical thickness (CT), the most commonly modelled morphometric measure. Regional CT values were extracted using the Desikan–Killiany (DK) atlas via the FreeSurfer recon-all pipeline. Pretrained normative models for the DK atlas were available in Brain MoNoCle, CentileBrain, and PCN Toolkit. Each model computed individual deviation scores (z-scores), adjusting for covariates including age, sex, and scanner site, and we assessed agreement between z-scores across modelling platforms. We then calculated effect sizes (Cohen’s d) for each DK region comparing a cohort of patients with right-onset mTLE to controls. Group-level effect sizes were then compared across models. Additional hemisphere-level analyses comparing BrainChart and Brain MoNoCle are reported in the Supplementary.

Despite differences in algorithms of the pretrained models, reference datasets, and modelling frameworks, outputs were broadly consistent (Figure 2). Individual z-scores were in high agreement across models. Regional effect sizes also showed strong agreement, particularly between Brain MoNoCle and PCN Toolkit. Effect sizes from CentileBrain were systematically offset relative to Brain MoNoCle and PCN Toolkit, reflecting differences in scaling, but the relative pattern of abnormality across regions remained similar.

These findings suggest that normative modelling platforms provide reliable individual- and group-level outputs because they all capture the same underlying population distribution of morphometric features. Large, representative training datasets and well-regularized models contribute to this consistency. Nonetheless, systematic offsets, such as those observed with CentileBrain, highlight that absolute values may differ even when relative patterns are preserved. Therefore, when feasible, using multiple platforms to validate key findings is advisable.

In practice, choice of platform may depend on study goals and practical considerations. These include the morphometric metrics supported (e.g. cortical thickness vs. surface area), compatibility with specific atlases, type of outputs (e.g. z-scores vs. centiles), computational efficiency, and ease of integration into an existing analysis pipeline.

Figure 2: Comparison of normative modelling tools in assessing cortical thickness alterations. A: Regional z-scores of an example subject, derived from three modelling platforms (Brain MoNoCle, CentileBrain, PCN Toolkit). B: Difference in z-score outputs of three modelling platforms. Left panel shows example subject in one region. Right panel shows average across regions and subjects. C: Top row illustrates region-wise calculation of Cohen’s d as effect size of group comparison between cases and controls. Bottom row shows correlation of effect sizes between three modelling platforms. Each data point represents a region of interest. The dashed diagonal lines indicate the line of equality. — Figure 2: **Comparison of normative modelling tools in assessing cortical thickness alterations.** A: Regional z-scores of an example subject, derived from three modelling platforms (Brain MoNoCle, CentileBrain, PCN Toolkit). B: Difference in z-score outputs of three modelling platforms. Left panel shows example subject in one region. Right panel shows average across regions and subjects. C: Top row illustrates region-wise calculation of Cohen’s d as effect size of group comparison between cases and controls. Bottom row shows correlation of effect sizes between three modelling platforms. Each data point represents a region of interest. The dashed diagonal lines indicate the line of equality.

Key takeaway: Normative modelling platform outputs generally agree, but absolute values can differ. Validate findings across tools and choose the model that best fits your metrics, atlas, and workflow.

How many healthy controls are needed to calibrate a new scanner site?↩︎

Normative models are trained on large healthy cohorts to establish expected brain morphology across age and other covariates (e.g. sex, scanner site). Empirical benchmarking on over 37,000 participants shows that model convergence and stability typically require training samples of roughly 3,000 subjects [22], [25]. When applied to a new dataset, these models are calibrated using a smaller, site-specific control cohort to infer site effects. Hierarchical Bayesian approaches further support this adaptability by using informative priors that preserve the normative baseline even with small adaptation samples [21]. This allows patient-level evaluation without large, matched control groups at every site - a major advantage in clinical studies where healthy control numbers are often limited. However, the practical question remains: how small can the site-matched control cohort be while still allowing accurate site adjustment?

To investigate this, we used Brain MoNoCle to compute right hemisphere cortical thickness decreases associated with right-hemisphere mTLE in the IDEAS dataset, adjusting the normative model with varying sizes of healthy control groups: small (n = 10), medium (n = 30), and full (n = 69). For each sample size, we performed 100 repetitions via sampling with replacement to simulate drawing controls from a larger population. Z-scores were calculated for all individuals, and Cohen’s d was used to compute effect sizes between controls and people with TLE.

While mean effects were similar across control sample sizes, very small groups introduced high variance, often over- or underestimating effects (Figure 3). For example, with n = 10 controls, only 30% of estimates fell within one standard deviation of the effect derived from the full control group. With n = 30, consistency improved substantially, with 50% of estimates within one standard deviation.

This demonstrates the importance of using adequately sized control groups when adjusting normative models to new sites. Although smaller samples are needed than if sex and age effects were inferred from matched controls, too few controls can distort biomarker discovery in clinical populations. Our findings using the Brain MoNoCle app suggest that normative models may perform reasonably well with as few as 30 site-matched controls, producing robust estimates of mean and standard deviation. Theoretical calculations support this: with n = 30, there is a 98% probability of estimating the standard deviation within 30% of its true value [29].

Figure 3: Reliability of normative model outputs depending on healthy controls sample size used for model calibration. A: Illustration of analysis. A pretrained normative model of right hemisphere average cortical thickness was calibrated to a new site using varying numbers of healthy controls (10, 30, or 69; sampled with replacement). Case–control effect sizes (Cohen’s d) were then computed, and the procedure repeated 100 times. B: Distributions of case-control effect sizes (Cohen’s d) across 100 repetitions for calibration with 10 (red), 30 (green), and 69 (blue) controls. C: Proportion of case-control effects that fell within one (left panel) or two (right panel) standard deviations of the effect of the effect estimated using the full healthy control cohort for model calibration. — Figure 3: **Reliability of normative model outputs depending on healthy controls sample size used for model calibration.** A: Illustration of analysis. A pretrained normative model of right hemisphere average cortical thickness was calibrated to a new site using varying numbers of healthy controls (10, 30, or 69; sampled with replacement). Case–control effect sizes (Cohen’s d) were then computed, and the procedure repeated 100 times. B: Distributions of case-control effect sizes (Cohen’s d) across 100 repetitions for calibration with 10 (red), 30 (green), and 69 (blue) controls. C: Proportion of case-control effects that fell within one (left panel) or two (right panel) standard deviations of the effect of the effect estimated using the full healthy control cohort for model calibration.

Key takeaway: For site-specific calibration of normative models, control cohorts as small as 30 subjects may provide robust estimates, though this number may vary between models. Very small cohorts (n \(\le\) 10) can produce unreliable deviation scores, and larger samples remain preferable when feasible.

What impact arises from demographic mismatches between cases and controls?↩︎

In clinical research, the reliability of outputs depends on the quality and representativeness of the control cohort. Brain structure changes significantly with age and differs between males and females [30], [31], so traditional case-control studies often match participants by age and sex to avoid bias [32]. When control groups are demographically mismatched, results can be compromised. Normative modelling addresses this by statistically adjusting for covariates, providing accurate interpretations even under less ideal conditions. To evaluate robustness under demographic imbalances, we simulated two scenarios using the IDEAS dataset and average cortical thickness (CT) as the morphometric measure.

Figure 4: Reliability of normative modelling outputs in the presence of an age mismatch between controls and cases. A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either age-matched or age-mismatched healthy controls. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using age-matched vs age-mismatched controls. Each data point represents a region of interest. — Figure 4: **Reliability of normative modelling outputs in the presence of an age mismatch between controls and cases.** A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either age-matched or age-mismatched healthy controls. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using age-matched vs age-mismatched controls. Each data point represents a region of interest.

In the first scenario, we tested age mismatches. Two control samples were prepared: one spanning the full age range (n = 34), and another including only older controls (age \(>\) 40; n = 34). Pretrained regional normative models (Brain MoNoCle) were calibrated to the new site, and used to derive regional effect sizes of cortical thickness for patients with right-onset TLE compared to healthy controls. We found strong agreement between model outputs using balanced versus older-only control samples across regions (Figure 4), with regional effects differing by only a small margin (average difference in regional effect size = 0.15).

In the second scenario, we assessed sex imbalances. Case-control effects between patients with right-onset TLE and controls were derived using either a sex-balanced control group (n = 50; 25 males, 25 females) and a sex-imbalanced group (n = 50; 10 males, 40 females) for calibrating pretrained regional normative models. Outputs from the sex-imbalanced group were highly correlated with those from the balanced group (Figure 5), with minimal differences in regional effects (average difference in regional effect size = 0.09).

Figure 5: Reliability of normative modelling outputs in the presence of a sex mismatch between controls and cases. A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either sex-matched or sex-mismatched healthy controls. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using sex-matched vs sex-mismatched controls. Each data point represents a region of interest. — Figure 5: **Reliability of normative modelling outputs in the presence of a sex mismatch between controls and cases.** A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either sex-matched or sex-mismatched healthy controls. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using sex-matched vs sex-mismatched controls. Each data point represents a region of interest.

These results demonstrate that normative modelling is robust to common demographic imbalances. Strong agreement between outputs from balanced and biased control samples shows that covariate effects can be effectively adjusted, even when the control cohort is not perfectly representative. While using demographically representative samples remains best practice, these findings support the practical utility of normative models in clinical settings, where ideal control data are often difficult to obtain.

Key takeaway: Normative models are robust to demographic mismatches; even when control cohorts are age- or sex-biased, deviation scores remain reliable. Nonetheless, using representative controls is recommended when possible.

Can deviation scores be reliably computed without any site-matched controls for calibration?↩︎

In quantitative neuroimaging, technical factors such as scanner hardware, site, and acquisition protocols introduce systematic variability in data. Multi-site studies have repeatedly shown that these factors can confound analyses. For example, [33] demonstrated substantial variation in cortical thickness measurements across 11 scanners, and [34] found that even after advanced preprocessing, a classifier could accurately identify the scanner used, highlighting persistent site-specific bias. Such effects can obscure pathology- or covariate-related morphological signals. Accurate matching of controls with patients at the same site and using the same protocols is therefore critical for reliable deviation scores and valid inferences.

A common challenge in clinical studies is the absence of local control data collected under the same scanning conditions as the patient cohort. Researchers may be tempted to use normative models adjusted with controls from a different site or scanner, assuming statistical covariate adjustment (e.g. for age and sex) is sufficient. To test this, we simulated this scenario using the IDEAS dataset.

We derived regional effect sizes of cortical thickness for patients with right TLE compared to healthy controls using Brain MoNoCle twice: once with controls acquired at the same site and protocol as the patients, and once using controls from a different scanner and acquisition protocol. Comparison of outputs revealed substantial discrepancies (Figure 6). When using mismatched controls, there is little agreement to the findings using the correctly matched controls, with an average difference in effect size of 0.43.

This example underscores the critical importance of site-matching controls and patients when applying normative models. Scanner-specific effects must be estimated for each new site. Using controls acquired under different scanning conditions introduces systematic errors in deviation scores, producing unreliable estimates.

Figure 6: Reliability of normative modelling outputs if controls and cases are not scanner/site matched. A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either controls acquired on the same scanner as cases or controls acquired on a different scanner. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using scanner-matched vs scanner-mismatched controls. Each data point represents a region of interest. — Figure 6: **Reliability of normative modelling outputs if controls and cases are not scanner/site matched.** A: Illustration of analysis. Pretrained regional normative models of cortical thickness were calibrated to a new site using either controls acquired on the same scanner as cases or controls acquired on a different scanner. Regional case–control effect sizes (Cohen’s d) were computed. B: Correlation of regional effect sizes obtained using scanner-matched vs scanner-mismatched controls. Each data point represents a region of interest.

Key takeaway: Deviation scores cannot be reliably computed without local controls matched by site and acquisition protocol. To ensure valid inference, every new dataset should include site-matched controls for model calibration.

Summary of recommendations↩︎

We have examined key methodological considerations for applying normative models to clinical neuroimaging data, focusing on the influence of model choice, control sample size, demographic composition, and scanner/protocol matching on results.

Choice of normative model or platform↩︎

Different normative modelling tools generally produce highly consistent individual- and group-level outputs. Absolute effect sizes may differ slightly between platforms, but relative patterns of pathology are preserved. Platform selection should therefore be guided by practical considerations, including model flexibility, the structural measures of interest, atlas compatibility, output format, computational efficiency, and ease of integration.

Control sample size for site calibration↩︎

Normative models reduce the need for large, site-matched control groups, but very small samples can introduce high variance and unreliable deviation scores. A minimum of \(\approx\) 30 controls per site provided stable estimates in our case study, though this may vary when using other models, and larger samples are generally preferred whenever feasible.

Demographic mismatches↩︎

Normative models can adjust for covariates such as age and sex, making them robust to moderate demographic imbalances. While demographically representative controls are ideal, deviation scores remain reliable even when the control cohort is somewhat biased.

Site- and protocol-matched controls↩︎

Accurate matching of controls and patients by scanner site and acquisition protocol is critical. Mismatched controls introduce systematic errors that can compromise deviation scores and lead to spurious findings. Site-matched controls should always be used when calibrating normative models to new datasets.

Overall guidance↩︎

Normative modelling is a powerful tool for detecting brain deviations in clinical populations, offering flexibility and robustness across tools and demographic variations. Ensuring adequate control sample sizes and strict site/protocol matching maximizes the reliability and clinical utility of deviation scores.

Conclusion↩︎

Normative modelling is a valuable approach for individual-level brain assessment in both research and clinical settings. However, its effectiveness is highly dependent on careful methodological choices and an understanding of potential pitfalls. Normative modelling is effective even for demographically imbalanced data. We emphasise the importance of a minimum number of control subjects and matching controls from the same scanner as patient cohorts to obtain reliable outputs. When observing these practical considerations and avoiding usage pitfalls, normative models are a highly reliable and powerful tool to support neuroimaging studies.

1 Acknowledgements↩︎

We thank members of the Computational Neurology, Neuroscience & Psychiatry Lab (www.cnnp-lab.com) for discussions on the analysis and manuscript; P.N.T. and Y.W. are both supported by UKRI Future Leaders Fellowships (MR/T04294X/1, MR/V026569/1).

Supplementary Text 1em

References↩︎

[1]

T.J. Cole. The development of growth references and growth charts. Annals of Human Biology, 39 (5): 382–394, September 2012. ISSN 0301-4460. . URL https://doi.org/10.3109/03014460.2012.694475. Publisher: Taylor & Francis _eprint: https://doi.org/10.3109/03014460.2012.694475.

[2]

Remika Mito, James H Cole, Sila Genc, Graeme D Jackson, and Andrew Zalesky. Towards precision MRI biomarkers in epilepsy with normative modelling. Brain, 148 (7): 2247–2261, July 2025. ISSN 0006-8950. . URL https://doi.org/10.1093/brain/awaf090.

[3]

Andre F. Marquand, Iead Rezek, Jan Buitelaar, and Christian F. Beckmann. Understanding Heterogeneity in ClinicalCohortsUsingNormativeModels: BeyondCase-ControlStudies. Biological Psychiatry, 80 (7): 552–561, October 2016. ISSN 0006-3223. . URL https://www.sciencedirect.com/science/article/pii/S0006322316000020.

[4]

Andre F. Marquand, Seyed Mostafa Kia, Mariam Zabihi, Thomas Wolfers, Jan K. Buitelaar, and Christian F. Beckmann. Conceptualizing mental disorders as deviations from normative functioning. Molecular Psychiatry, 24 (10): 1415–1424, October 2019. ISSN 1476-5578. . URL https://www.nature.com/articles/s41380-019-0441-1. Publisher: Nature Publishing Group.

[5]

Saige Rutherford, Seyed Mostafa Kia, Thomas Wolfers, Charlotte Fraza, Mariam Zabihi, Richard Dinga, Pierre Berthet, Amanda Worker, Serena Verdi, Henricus G. Ruhe, Christian F. Beckmann, and Andre F. Marquand. The normative modeling framework for computational psychiatry. Nature Protocols, 17 (7): 1711–1734, July 2022. ISSN 1750-2799. . URL https://www.nature.com/articles/s41596-022-00696-5. Publisher: Nature Publishing Group.

[6]

Jake E Mitchell, Stuart J McDonald, David J Sharp, Gavin Gan, Jennie L Ponsford, Andre Marquand, Cheryl Wellington, Meng Law, Sandy R Shultz, and Gershon Spitz. The normative modelling framework for traumatic brain injury. Brain, page awaf296, August 2025. ISSN 0006-8950. . URL https://doi.org/10.1093/brain/awaf296.

[7]

Veera Itälinna, Hanna Kaltiainen, Nina Forss, Mia Liljeström, and Lauri Parkkonen. Using normative modeling and machine learning for detecting mild traumatic brain injury from magnetoencephalography data. PLOS Computational Biology, 19 (11): e1011613, November 2023. ISSN 1553-7358. . URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011613. Publisher: Public Library of Science.

[8]

Walter H. L. Pinaya, Cristina Scarpazza, Rafael Garcia-Dias, Sandra Vieira, Lea Baecker, Pedro F da Costa, Alberto Redolfi, Giovanni B. Frisoni, Michela Pievani, Vince D. Calhoun, João R. Sato, and Andrea Mechelli. Using normative modelling to detect disease progression in mild cognitive impairment and Alzheimer’s disease in a cross-sectional multi-cohort study. Scientific Reports, 11 (1): 15746, August 2021. ISSN 2045-2322. . URL https://www.nature.com/articles/s41598-021-95098-0. Publisher: Nature Publishing Group.

[9]

Serena Verdi, Seyed Mostafa Kia, Keir X.X. Yong, Duygu Tosun, Jonathan M. Schott, Andre F. Marquand, and James H. Cole. Revealing IndividualNeuroanatomicalHeterogeneity in AlzheimerDiseaseUsingNeuroanatomicalNormativeModeling. Neurology, 100 (24): e2442–e2453, June 2023. . URL https://www.neurology.org/doi/10.1212/WNL.0000000000207298. Publisher: Wolters Kluwer.

[10]

Serena Verdi, Saige Rutherford, Charlotte Fraza, Duygu Tosun, Andre Altmann, Lars Lau Raket, Jonathan M. Schott, Andre F. Marquand, James H. Cole, and for the Alzheimer’s Disease Neuroimaging Initiative. Personalizing progressive changes to brain structure in Alzheimer’s disease using normative modeling. Alzheimer’s & Dementia, 20 (10): 6998–7012, 2024. ISSN 1552-5279. . URL https://onlinelibrary.wiley.com/doi/abs/10.1002/alz.14174. _eprint: https://alz-journals.onlinelibrary.wiley.com/doi/pdf/10.1002/alz.14174.

[11]

Serena Verdi, Andre F Marquand, Jonathan M Schott, and James H Cole. Beyond the average patient: how neuroimaging models can address heterogeneity in dementia. Brain, 144 (10): 2946–2953, October 2021. ISSN 0006-8950. . URL https://doi.org/10.1093/brain/awab165.

[12]

Flavia Loreto, Serena Verdi, Seyed Mostafa Kia, Aleksandar Duvnjak, Haneen Hakeem, Anna Fitzgerald, Neva Patel, Johan Lilja, Zarni Win, Richard Perry, Andre F. Marquand, James H. Cole, and Paresh Malhotra. Alzheimer’s disease heterogeneity revealed by neuroanatomical normative modeling. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 16 (1): e12559, 2024. ISSN 2352-8729. . URL https://onlinelibrary.wiley.com/doi/abs/10.1002/dad2.12559. _eprint: https://alz-journals.onlinelibrary.wiley.com/doi/pdf/10.1002/dad2.12559.

[13]

Rikka Kjelkenes, Thomas Wolfers, Dag Alnæs, Dennis van der Meer, Mads Lund Pedersen, Andreas Dahl, Irene Voldsbekk, Torgeir Moberget, Christian K. Tamnes, Ole A. Andreassen, Andre F. Marquand, and Lars T. Westlye. Mapping NormativeTrajectories of CognitiveFunction and ItsRelation to PsychopathologySymptoms and GeneticRisk in Youth. Biological Psychiatry Global Open Science, 3 (2): 255–263, April 2023. ISSN 2667-1743. . URL https://www.sciencedirect.com/science/article/pii/S266717432200012X.

[14]

Nathalie E. Holz, Dorothea L. Floris, Alberto Llera, Pascal M. Aggensteiner, Seyed Mostafa Kia, Thomas Wolfers, Sarah Baumeister, Boris Böttinger, Jeffrey C. Glennon, Pieter J. Hoekstra, Andrea Dietrich, Melanie C. Saam, Ulrike M. E. Schulze, David J. Lythgoe, Steve C. R. Williams, Paramala Santosh, Mireia Rosa-Justicia, Nuria Bargallo, Josefina Castro-Fornieles, Celso Arango, Maria J. Penzol, Susanne Walitza, Andreas Meyer-Lindenberg, Marcel Zwiers, Barbara Franke, Jan Buitelaar, Jilly Naaijen, Daniel Brandeis, Christian Beckmann, Tobias Banaschewski, and Andre F. Marquand. Age-related brain deviations and aggression. Psychological Medicine, 53 (9): 4012–4021, July 2023. ISSN 0033-2917, 1469-8978. . URL https://www.cambridge.org/core/journals/psychological-medicine/article/agerelated-brain-deviations-and-aggression/7FD5466395682483BB0BFA4E850A2EA4.

[15]

Jinglei Lv, Maria Di Biase, Robin F. H. Cash, Luca Cocchi, Vanessa L. Cropley, Paul Klauser, Ye Tian, Johanna Bayer, Lianne Schmaal, Suheyla Cetin-Karayumak, Yogesh Rathi, Ofer Pasternak, Chad Bousman, Christos Pantelis, Fernando Calamante, and Andrew Zalesky. Individual deviations from normative models of brain structure in a large cross-sectional schizophrenia cohort. Molecular psychiatry, 26 (7): 3512–3523, July 2021. ISSN 1359-4184. . URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8329928/.

[16]

Thomas Wolfers, Nhat Trung Doan, Tobias Kaufmann, Dag Alnæs, Torgeir Moberget, Ingrid Agartz, Jan K. Buitelaar, Torill Ueland, Ingrid Melle, Barbara Franke, Ole A. Andreassen, Christian F. Beckmann, Lars T. Westlye, and Andre F. Marquand. Mapping the HeterogeneousPhenotype of Schizophrenia and BipolarDisorderUsingNormativeModels. JAMA Psychiatry, 75 (11): 1146–1155, November 2018. ISSN 2168-622X. . URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248110/.

[17]

Thomas Wolfers, Christian F. Beckmann, Martine Hoogman, Jan K. Buitelaar, Barbara Franke, and Andre F. Marquand. Individual differences v. the average patient: mapping the heterogeneity in ADHD using normative models. Psychological Medicine, 50 (2): 314–323, January 2020. ISSN 0033-2917, 1469-8978. . URL https://www.cambridge.org/core/journals/psychological-medicine/article/individual-differences-v-the-average-patient-mapping-the-heterogeneity-in-adhd-using-normative-models/271BA9E4599EF1A64BA9CB61A080B2ED.

[18]

Richard A. I. Bethlehem, Jakob Seidlitz, Rafael Romero-Garcia, Stavros Trakoshis, Guillaume Dumas, and Michael V. Lombardo. A normative modelling approach reveals age-atypical cortical thickness in a subgroup of males with autism spectrum disorder. Communications Biology, 3 (1): 486, September 2020. ISSN 2399-3642. . URL https://www.nature.com/articles/s42003-020-01212-9. Publisher: Nature Publishing Group.

[19]

Mariam Zabihi, Marianne Oldehinkel, Thomas Wolfers, Vincent Frouin, David Goyard, Eva Loth, Tony Charman, Julian Tillmann, Tobias Banaschewski, Guillaume Dumas, Rosemary Holt, Simon Baron-Cohen, Sarah Durston, Sven Bölte, Declan Murphy, Christine Ecker, Jan K. Buitelaar, Christian F. Beckmann, and Andre F. Marquand. Dissecting the HeterogeneousCorticalAnatomy of AutismSpectrumDisorderUsingNormativeModels. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 4 (6): 567–578, June 2019. ISSN 2451-9022. . URL https://www.sciencedirect.com/science/article/pii/S245190221830329X.

[20]

Mariam Zabihi, Dorothea L. Floris, Seyed Mostafa Kia, Thomas Wolfers, Julian Tillmann, Alberto Llera Arenas, Carolin Moessnang, Tobias Banaschewski, Rosemary Holt, Simon Baron-Cohen, Eva Loth, Tony Charman, Thomas Bourgeron, Declan Murphy, Christine Ecker, Jan K. Buitelaar, Christian F. Beckmann, and Andre Marquand. Fractionating autism based on neuroanatomical normative modeling. Translational Psychiatry, 10 (1): 384, November 2020. ISSN 2158-3188. . URL https://www.nature.com/articles/s41398-020-01057-0. Publisher: Nature Publishing Group.

[21]

Seyed Mostafa Kia, Hester Huijsdens, Richard Dinga, Thomas Wolfers, Maarten Mennes, Ole A. Andreassen, Lars T. Westlye, Christian F. Beckmann, and Andre F. Marquand. Hierarchical BayesianRegression for Multi-site NormativeModeling of NeuroimagingData. In Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pages 699–709, Cham, 2020. Springer International Publishing. ISBN 978-3-030-59728-3. .

[22]

Ruiyang Ge, Yuetong Yu, Yi Xuan Qi, Yu-nan Fan, Shiyu Chen, Chuntong Gao, Shalaila S. Haas, Faye New, Dorret I. Boomsma, Henry Brodaty, Rachel M. Brouwer, Randy Buckner, Xavier Caseras, Fabrice Crivello, Eveline A. Crone, Susanne Erk, Simon E. Fisher, Barbara Franke, David C. Glahn, Udo Dannlowski, Dominik Grotegerd, Oliver Gruber, Hilleke E. Hulshoff Pol, Gunter Schumann, Christian K. Tamnes, Henrik Walter, Lara M. Wierenga, Neda Jahanshad, Paul M. Thompson, Sophia Frangou, Ingrid Agartz, Philip Asherson, Rosa Ayesa-Arriola, Nerisa Banaj, Tobias Banaschewski, Sarah Baumeister, Alessandro Bertolino, Stefan Borgwardt, Josiane Bourque, Daniel Brandeis, Alan Breier, Jan K. Buitelaar, Dara M. Cannon, Simon Cervenka, Patricia J. Conrod, Benedicto Crespo-Facorro, Christopher G. Davey, Lieuwe de Haan, Greig I. de Zubicaray, Annabella Di Giorgio, Thomas Frodl, Patricia Gruner, Raquel E. Gur, Ruben C. Gur, Ben J. Harrison, Sean N. Hatton, Ian Hickie, Fleur M. Howells, Chaim Huyser, Terry L. Jernigan, Jiyang Jiang, John A. Joska, René S. Kahn, Andrew J. Kalnin, Nicole A. Kochan, Sanne Koops, Jonna Kuntsi, Jim Lagopoulos, Luisa Lazaro, Irina S. Lebedeva, Christine Lochner, Nicholas G. Martin, Bernard Mazoyer, Brenna C. McDonald, Colm McDonald, Katie L. McMahon, Sarah Medland, Amirhossein Modabbernia, Benson Mwangi, Tomohiro Nakao, Lars Nyberg, Fabrizio Piras, Maria J. Portella, Jiang Qiu, Joshua L. Roffman, Perminder S. Sachdev, Nicole Sanford, Theodore D. Satterthwaite, Andrew J. Saykin, Carl M. Sellgren, Kang Sim, Jordan W. Smoller, Jair C. Soares, Iris E. Sommer, Gianfranco Spalletta, Dan J. Stein, Sophia I. Thomopoulos, Alexander S. Tomyshev, Diana Tordesillas-Gutiérrez, Julian N. Trollor, Dennis van ’t Ent, Odile A. van den Heuvel, Theo GM van Erp, Neeltje EM van Haren, Daniela Vecchio, Dick J. Veltman, Yang Wang, Bernd Weber, Dongtao Wei, Wei Wen, Lars T. Westlye, Steven CR Williams, Margaret J. Wright, Mon-Ju Wu, and Kevin Yu. Normative modelling of brain morphometry across the lifespan with CentileBrain: algorithm benchmarking and model optimisation. The Lancet Digital Health, 6 (3): e211–e221, March 2024. ISSN 2589-7500. . URL https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00250-9/fulltext. Publisher: Elsevier.

[23]

Augustijn A. A. de Boer, Johanna M. M. Bayer, Seyed Mostafa Kia, Saige Rutherford, Mariam Zabihi, Charlotte Fraza, Pieter Barkema, Lars T. Westlye, Ole A. Andreassen, Max Hinne, Christian F. Beckmann, and Andre Marquand. Non-Gaussian normative modelling with hierarchical Bayesian regression. Imaging Neuroscience, 2: 1–36, April 2024. ISSN 2837-6056. . URL https://doi.org/10.1162/imag_a_00132.

[24]

Johanna M. M. Bayer, Richard Dinga, Seyed Mostafa Kia, Akhil R. Kottaram, Thomas Wolfers, Jinglei Lv, Andrew Zalesky, Lianne Schmaal, and Andre Marquand. Accommodating site variation in neuroimaging data using normative and hierarchical Bayesian models. NeuroImage, 264: 119699, December 2022. ISSN 1053-8119. . URL https://www.sciencedirect.com/science/article/pii/S1053811922008205.

[25]

Bethany Little, Nida Alyas, Alexander Surtees, Gavin P Winston, John S Duncan, David A Cousins, John-Paul Taylor, Peter Taylor, Karoline Leiberg, and Yujiang Wang. Brain morphology normative modelling platform for abnormality and centile estimation: BrainMoNoCle. Imaging Neuroscience, 3: imag_a_00438, January 2025. ISSN 2837-6056. . URL https://doi.org/10.1162/imag_a_00438.

[26]

R. A. I. Bethlehem, J. Seidlitz, S. R. White, J. W. Vogel, K. M. Anderson, C. Adamson, S. Adler, G. S. Alexopoulos, E. Anagnostou, A. Areces-Gonzalez, D. E. Astle, B. Auyeung, M. Ayub, J. Bae, G. Ball, S. Baron-Cohen, R. Beare, S. A. Bedford, V. Benegal, F. Beyer, J. Blangero, M. Blesa Cábez, J. P. Boardman, M. Borzage, J. F. Bosch-Bayard, N. Bourke, V. D. Calhoun, M. M. Chakravarty, C. Chen, C. Chertavian, G. Chetelat, Y. S. Chong, J. H. Cole, A. Corvin, M. Costantino, E. Courchesne, F. Crivello, V. L. Cropley, J. Crosbie, N. Crossley, M. Delarue, R. Delorme, S. Desrivieres, G. A. Devenyi, M. A. Di Biase, R. Dolan, K. A. Donald, G. Donohoe, K. Dunlop, A. D. Edwards, J. T. Elison, C. T. Ellis, J. A. Elman, L. Eyler, D. A. Fair, E. Feczko, P. C. Fletcher, P. Fonagy, C. E. Franz, L. Galan-Garcia, A. Gholipour, J. Giedd, J. H. Gilmore, D. C. Glahn, I. M. Goodyer, P. E. Grant, N. A. Groenewold, F. M. Gunning, R. E. Gur, R. C. Gur, C. F. Hammill, O. Hansson, T. Hedden, A. Heinz, R. N. Henson, K. Heuer, J. Hoare, B. Holla, A. J. Holmes, R. Holt, H. Huang, K. Im, J. Ipser, C. R. Jack, A. P. Jackowski, T. Jia, K. A. Johnson, P. B. Jones, D. T. Jones, R. S. Kahn, H. Karlsson, L. Karlsson, R. Kawashima, E. A. Kelley, S. Kern, K. W. Kim, M. G. Kitzbichler, W. S. Kremen, F. Lalonde, B. Landeau, S. Lee, J. Lerch, J. D. Lewis, J. Li, W. Liao, C. Liston, M. V. Lombardo, J. Lv, C. Lynch, T. T. Mallard, M. Marcelis, R. D. Markello, S. R. Mathias, B. Mazoyer, P. McGuire, M. J. Meaney, A. Mechelli, N. Medic, B. Misic, S. E. Morgan, D. Mothersill, J. Nigg, M. Q. W. Ong, C. Ortinau, R. Ossenkoppele, M. Ouyang, L. Palaniyappan, L. Paly, P. M. Pan, C. Pantelis, M. M. Park, T. Paus, Z. Pausova, D. Paz-Linares, A. Pichet Binette, K. Pierce, X. Qian, J. Qiu, A. Qiu, A. Raznahan, T. Rittman, A. Rodrigue, C. K. Rollins, R. Romero-Garcia, L. Ronan, M. D. Rosenberg, D. H. Rowitch, G. A. Salum, T. D. Satterthwaite, H. L. Schaare, R. J. Schachar, A. P. Schultz, G. Schumann, M. Schöll, D. Sharp, R. T. Shinohara, I. Skoog, C. D. Smyser, R. A. Sperling, D. J. Stein, A. Stolicyn, J. Suckling, G. Sullivan, Y. Taki, B. Thyreau, R. Toro, N. Traut, K. A. Tsvetanov, N. B. Turk-Browne, J. J. Tuulari, C. Tzourio, É. Vachon-Presseau, M. J. Valdes-Sosa, P. A. Valdes-Sosa, S. L. Valk, T. van Amelsvoort, S. N. Vandekar, L. Vasung, L. W. Victoria, S. Villeneuve, A. Villringer, P. E. Vértes, K. Wagstyl, Y. S. Wang, S. K. Warfield, V. Warrier, E. Westman, M. L. Westwater, H. C. Whalley, A. V. Witte, N. Yang, B. Yeo, H. Yun, A. Zalesky, H. J. Zar, A. Zettergren, J. H. Zhou, H. Ziauddeen, A. Zugman, X. N. Zuo, C. Rowe, G. B. Frisoni, A. Pichet Binette, E. T. Bullmore, and A. F. Alexander-Bloch. Brain charts for the human lifespan. Nature 2022, pages 1–11, April 2022. ISSN 1476-4687. . URL https://www.nature.com/articles/s41586-022-04554-y. Publisher: Nature Publishing Group.

[27]

Saige Rutherford, Pieter Barkema, Ivy F Tso, Chandra Sripada, Christian Beckmann, Henricus G Ruhe, and Andre F Marquand. Evidence for embracing normative modeling. eLife, 12: 1–24, 2023. ISSN 2050-084X. . URL http://www.ncbi.nlm.nih.gov/pubmed/36912775.

[28]

Peter N. Taylor, Yujiang Wang, Callum Simpson, Vytene Janiukstyte, Jonathan Horsley, Karoline Leiberg, Beth Little, Harry Clifford, Sophie Adler, Sjoerd B. Vos, Gavin P. Winston, Andrew W. McEvoy, Anna Miserocchi, Jane de Tisi, and John S. Duncan. The ImagingDatabase for EpilepsyAndSurgery(IDEAS). Epilepsia, 66 (2): 471–481, 2025. ISSN 1528-1167. . URL https://onlinelibrary.wiley.com/doi/abs/10.1111/epi.18192. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/epi.18192.

[29]

Michael A. Schillaci and Mario E. Schillaci. Estimating the population variance, standard deviation, and coefficient of variation: Sample size and accuracy. Journal of Human Evolution, 171: 103230, October 2022. ISSN 00472484. . URL https://linkinghub.elsevier.com/retrieve/pii/S0047248422000902. CitationKey: Schillaci2022.

[30]

Sophia Frangou, Amirhossein Modabbernia, Steven C R Williams, Paola Fuentes-claramonte, and David C Glahn. Cortical thickness across the lifespan : Data from 17 , 075 healthy individuals aged 3 – 90 years. Human Brain Mapping, (November 2020): 1–21, 2021. .

[31]

Amber N. V. Ruigrok, Gholamreza Salimi-Khorshidi, Meng-Chuan Lai, Simon Baron-Cohen, Michael V. Lombardo, Roger J. Tait, and John Suckling. A meta-analysis of sex differences in human brain structure. Neuroscience & Biobehavioral Reviews, 39: 34–50, February 2014. ISSN 0149-7634. . URL https://www.sciencedirect.com/science/article/pii/S0149763413003011.

[32]

Masao Iwagami and Tomohiro Shinozaki. Introduction to Matching in Case-Control and CohortStudies. Annals of Clinical Epidemiology, 4 (2): 33–40, April 2022. ISSN 2434-4338. . URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10760465/.

[33]

Jean Philippe Fortin, Nicholas Cullen, Yvette I. Sheline, Warren D. Taylor, Irem Aselcioglu, Philip A. Cook, Phil Adams, Crystal Cooper, Maurizio Fava, Patrick J. McGrath, Melvin McInnis, Mary L. Phillips, Madhukar H. Trivedi, Myrna M. Weissman, and Russell T. Shinohara. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage, 167: 104–120, February 2018. ISSN 10959572. . Publisher: Academic Press Inc.

[34]

Ben Glocker, Robert Robinson, Daniel C. Castro, Qi Dou, and Ender Konukoglu. Machine learning with multi-site imaging data: An empirical study on the impact of scanner effects. arXiv preprint arXiv:1910.04597, 2019.

Normative Modelling in Neuroimaging:
A Practical Guide for Researchers

Abstract↩︎

Introduction↩︎

How does the choice of normative model or platform influence results?↩︎

How many healthy controls are needed to calibrate a new scanner site?↩︎

What impact arises from demographic mismatches between cases and controls?↩︎

Can deviation scores be reliably computed without any site-matched controls for calibration?↩︎

Summary of recommendations↩︎

Choice of normative model or platform↩︎

Control sample size for site calibration↩︎

Demographic mismatches↩︎

Site- and protocol-matched controls↩︎

Overall guidance↩︎

Conclusion↩︎

1 Acknowledgements↩︎

References↩︎

Subjects

Updated on Academus

Normative Modelling in Neuroimaging: A Practical Guide for Researchers

Abstract↩︎

Introduction↩︎

How does the choice of normative model or platform influence results?↩︎

How many healthy controls are needed to calibrate a new scanner site?↩︎

What impact arises from demographic mismatches between cases and controls?↩︎

Can deviation scores be reliably computed without any site-matched controls for calibration?↩︎

Summary of recommendations↩︎

Choice of normative model or platform↩︎

Control sample size for site calibration↩︎

Demographic mismatches↩︎

Site- and protocol-matched controls↩︎

Overall guidance↩︎

Conclusion↩︎

1 Acknowledgements↩︎

References↩︎

Subjects

Updated on Academus

Normative Modelling in Neuroimaging:
A Practical Guide for Researchers