New articles on Statistics


[1] 2603.03375

The Theory behind UMAP?

In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an unpublished draft by Spivak. This draft contains many errors, most of which are reproduced by McInnes et al. and subsequent publications. This article aims to repair these errors and provide a self-contained document with the full derivation of Spivak's functors and McInnes et al.'s finite variant. We contribute an explicit description of the metric realization and related functors. At the end, we discuss the UMAP algorithm, as well as claims about properties of the algorithm and the correspondence of McInnes et al.'s finite variant to the UMAP algorithm.


[2] 2603.03387

Learning Order Forest for Qualitative-Attribute Data Clustering

Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status, etc. This paper, therefore, discovered a tree-like distance structure to flexibly represent the local order relationship among intra-attribute qualitative values. That is, treating a value as the vertex of the tree allows to capture rich order relationships among the vertex value and the others. To obtain the trees in a clustering-friendly form, a joint learning mechanism is proposed to iteratively obtain more appropriate tree structures and clusters. It turns out that the latent distance space of the whole dataset can be well-represented by a forest consisting of the learned trees. Extensive experiments demonstrate that the joint learning adapts the forest to the clustering task to yield accurate results. Comparisons of 10 counterparts on 12 real benchmark datasets with significance tests verify the superiority of the proposed method.


[3] 2603.03401

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.


[4] 2603.03405

Surprisal-Rényi Free Energy

The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we introduce the Surprisal-Rényi Free Energy (SRFE), a log-moment-based functional of the likelihood ratio that lies outside the class of $f$-divergences. We show that SRFE recovers forward and reverse KL divergences as singular endpoint limits and derive local expansions around both limits in which the variance of the log-likelihood ratio appears as a first-order correction. This reveals an explicit mean-variance tradeoff governing departures from KL-dominated regimes. We further establish a Gibbs-type variational characterization of SRFE as the unique minimizer of a weighted sum of KL divergences and prove that SRFE directly controls large deviations of excess code-length via Chernoff-type bounds, yielding a precise Minimum Description Length interpretation. Together, these results identify SRFE as a variance- and tail-sensitive free-energy functional that clarifies the geometric and large-deviation structure underlying forward and reverse KL limits, without unifying or subsuming distinct learning frameworks.


[5] 2603.03411

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a scalable causal discovery model for paired observational and interventional settings with shared underlying causal structure and unknown soft interventions. The model aggregates subset-level PDAGs and applies contrastive cross-regime orientation rules to construct a globally consistent maximal PDAG under Meek closure, enabling generalization to both in-distribution and out-of-distribution settings. Theoretically, we prove that our model is sound with respect to a restricted $\Psi$ equivalence class induced solely by the information available in the subset-restricted setting. We further show that the model asymptotically recovers the corresponding identifiable PDAG and can orient additional edges compared to non-contrastive subset-restricted methods. Experiments on synthetic data demonstrate improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs, with ablations supporting the theoretical results.


[6] 2603.03445

The Certainty Bound: Structural Limits on Scientific Reliability

Explanations of the replication crisis often emphasize misconduct, questionable research practices, or incentive misalignment, implying that behavioral reform is sufficient. This paper argues that a substantial component is architectural: within binary significance-based publication systems, even perfectly diligent researchers face structural limits on the reliability they can deliver. The posterior log-odds of a finding equal prior log-odds plus log(Lambda), where Lambda = (1-beta)/alpha is the experimental leverage. Interpreted architecturally, this implies a hard constraint: once evidence is coarsened to a binary significance decision, the decision rule contributes exactly log(Lambda) to posterior log-odds. A target reliability tau is feasible iff pi >= pi_crit, and under fixed alpha this generally cannot be rescued by sample size alone. Two mechanisms can drive effective leverage to 1 without bad faith: persistent unmeasured confounding in observational studies and unbounded specification search under publication pressure. These results concern binary significance-based decision architectures and do not bound inference based on full likelihoods or richer continuous evidence summaries. Two collapse results formalize these mechanisms, while the Replication Pipeline Theorem and Minimum Pipeline Depth Corollary identify a quantitative evidentiary standard for escape. Using independently documented parameters for pre-reform psychology (pi about 0.10, power about 0.35), the framework implies a replication rate of 36%, consistent with the Open Science Collaboration. The framework also provides quantitative bridges to Popper, Kuhn, and Lakatos. In low-prior settings below the single-study feasibility threshold, the natural unit of evidence is the replication pipeline rather than the individual experiment.


[7] 2603.03569

Bayesian Estimation of Variance under Fine Stratification via Mean-Variance Smoothing

Fine stratification survey is useful in many applications as its point estimator is unbiased, but the variance estimator under the design cannot be easily obtained, particularly when the sample size per stratum is as small as one unit. One common practice to overcome this difficulty is to collapse strata in pairs to create pseudo-strata and then estimate the variance. The estimator of variance achieved is not design-unbiased, and the positive bias increases as the population means of the paired pseudo-strata become more variant. The resulting confidence intervals can be unnecessarily large. In this paper, we propose a new Bayesian estimator for variance which does not rely on collapsing strata, unlike the previous methods given in the literature. We employ the penalized spline method for smoothing the mean and variance together in a nonparametric way. Furthermore, we make comparisons with the earlier work of Breidt et al. (2016). Throughout multiple simulation studies and an illustration using data from the National Survey of Family Growth (NSFG), we demonstrate the favorable performance of our methodology.


[8] 2603.03587

Controllable Generative Sandbox for Causal Inference

Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.


[9] 2603.03613

Empirical Evaluation of No Free Lunch Violations in Permutation-Based Optimization

The No Free Lunch (NFL) theorem guarantees equal average performance only under uniform sampling of a function space closed under permutation (c.u.p.). We ask when this averaging ceases to reflect what benchmarking actually reports. We study an iterative-search setting with sampling without replacement, where algorithms differ only in evaluation order. Binary objectives allow exhaustive evaluation in the fully enumerable case, and efficiency is defined by the first time the global minimum is reached. We then construct two additional benchmarks by algebraically recombining the same baseline functions through sums and differences. Function-algorithm relations are examined via correlation structure, hierarchical clustering, delta heatmaps, and PCA. A one-way ANOVA with Tukey contrasts confirms that algebraic reformulations induce statistically meaningful shifts in performance patterns. The uniformly sampled baseline remains consistent with the global NFL symmetry. In contrast, the algebraically modified benchmarks yield stable re-rankings and coherent clusters of functions and sampling policies. Composite objectives can also exhibit non-additive search effort despite being built from simpler components. Monte Carlo experiments indicate that order effects persist in larger spaces and depend on function class. Taken together, the results show how objective reformulation and benchmark design can generate structured local departures from NFL intuition. They motivate algorithm choice that is aware of both the problem class and the objective representation. This message applies to evolutionary computation as well as to statistical procedures based on relabeling, resampling, and permutation tests.


[10] 2603.03626

Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme

Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations (SDEs) on manifolds, which raises the need for convergence theory of numerical schemes for manifold-valued SDEs. In Euclidean space, the Euler--Maruyama (EM) scheme achieves strong convergence with order $1/2$, but an analogous result for manifold discretizations is less understood in general settings. In this work, we study a geometric version of the EM scheme for SDEs on Riemannian manifolds and prove strong convergence with order $1/2$ under geometric and regularity conditions. As an application, we obtain a Wasserstein bound for sampling on manifolds via the geometric EM discretization of Riemannian Langevin dynamics.


[11] 2603.03674

HiMAP: Hilbert Mass-Aligned Parameterization for Multivariate Barycenters and Frećhet Regression

Many learning tasks represent responses as multivariate probability measures, requiring repeated computation of weighted barycenters in Wasserstein space. In multivariate settings, transport barycenters are often computationally demanding and, more importantly, are generally not well posed under the affine weight schemes inherent to global and local Frećhet regression, where weights sum to one but may be negative. We propose HiMAP, a Hilbert mass-aligned parameterization that endows multivariate measures with a distribution-invariant notion of quantile level. The construction recursively refines the domain through equiprobable conditional-median splits and follows a Hilbert curve ordering, so that a single scalar index consistently tracks cumulative probability mass across distributions. This yields an embedding into a Hilbert function space and induces a tractable discrepancy for distribution comparison and averaging. Crucially, the representation is closed under affine averaging, leading to a closed-form, well-posed barycenter and an explicit distribution-valued Frećhet regression estimator obtained by averaging HiMAP quantile maps. We establish consistency and a dimension-dependent polynomial convergence rate for HiMAP estimators under mild conditions, matching the classical rates for empirical convergence in multivariate Wasserstein geometry. Numerical experiments and a multivariate climate-indicator study demonstrate that HiMAP delivers barycenters and regression fits comparable to standard optimal-transport surrogates while achieving substantial speedups in schemes dominated by repeated barycenter evaluations.


[12] 2603.03700

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $\mu$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $\mu$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $\mu$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hat{\mu}$ and $\mu$ scales as $\mathbb{E}\, \mathbb{W}_p(\hat{\mu},\mu) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(\mu)}\right),$ where $d^\ast_{p,q}(\mu)$ is the $(p,q)$-Wasserstein dimension of $\mu$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(\mu)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.


[13] 2603.03763

On large bandwidth matrix values kernel smoothed estimators for multi-index models

The kernel smoothing with large bandwidth values causes oversmoothing or underfitting in general. However, when irrelevant variables are included, the corresponding large bandwidth values are known to have an effect of shrinking them. This study investigates asymptotic properties of the kernel conditional density estimator and the regression estimator with large bandwidth matrix elements for cases of multi-index model. It is clarified that the optimal convergence rate of the estimators depends on not the number of the variables but the effective dimension without eliminating the irrelevant variables. Thus, the kernel conditional density estimator and regression estimator are demonstrated to equip the reduction of the curse of dimensionality by nature. Finite sample performances are investigated by a numerical study, and the bandwidth selection is discussed. Finally a case study on the Boston housing data is provided.


[14] 2603.03785

Observationally Informed Adaptive Causal Experimental Design

Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded in prospective trial design due to bias concerns. We argue this "tabula rasa" data acquisition strategy is fundamentally inefficient. In this work, we propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior. This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias. To operationalize this, we introduce the R-Design framework. Theoretically, we establish two key advantages: (1) a structural efficiency gap, proving that estimating smooth residual contrasts admits strictly faster convergence rates than reconstructing full outcomes; and (2) information efficiency, where we quantify the redundancy in standard parameter-based acquisition (e.g., BALD), demonstrating that such baselines waste budget on task-irrelevant nuisance uncertainty. We propose R-EPIG (Residual Expected Predictive Information Gain), a unified criterion that directly targets the causal estimand, minimizing residual uncertainty for estimation or clarifying decision boundaries for policy. Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines, confirming that repairing a biased model is far more efficient than learning one from scratch.


[15] 2603.03789

Enhancing Mortality Forecasting with Ensemble Learning: A Shapley-Based Approach

A well-established insight in mortality forecasting is that combining predictions from a set of models improves accuracy compared to relying on a single best model. This paper proposes a novel ensemble approach based on Shapley values, a game-theoretic measure of each model's marginal contribution to the forecast. We further compute these SHapley Additive exPlanations (SHAP)-based weights age-by-age, thereby capturing the specific contribution of each model at each age. In addition, we introduce a threshold mechanism that excludes models with negligible contributions, effectively reducing the forecast variance. Using data from 24 OECD countries, we demonstrate that our SHAP ensemble enhances out-of-sample forecasting performance, especially at longer horizons. By leveraging the complementary strengths of different mortality models and filtering out those that add little predictive power, our approach offers a robust and interpretable solution for improving mortality forecasts.


[16] 2603.03816

The projected isotropic normal distribution with applications in neuroscience

This paper is motivated by a cutting-edge application in neuroscience: the analysis of electroencephalogram (EEG) signals recorded under flash stimulation. Under commonly used signal-processing assumptions, only the phase angle of the EEG is required for the analysis of such applications. We demonstrate that these assumptions imply that the phase has a projected isotropic normal distribution. We revisit this distribution and derive several new properties, including closed-form expressions for its trigonometric moments. We then examine the distribution of the mean resultant and its square -- a statistic of central importance in phase-based EEG studies. The distribution of the resultant is analytically intricate; to make it practically useful, we develop two approximations based on the well-known resultant distribution for the von Mises distribution. We then study inference problems for this projected isotropic normal distribution. The method is illustrated with an application to EEG data from flash-stimulation experiments.


[17] 2603.03819

Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs

Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with covariates, and ignoring such heterogeneity can lead to misleading conclusions, which motivates flexible modeling of heterogeneous treatment effects in RDD. To this end, we propose a Bayesian nonparametric approach to estimating heterogeneous treatment effects based on Bayesian Additive Regression Trees (BART). The key feature of our method lies in adopting a general Bayesian framework using a pseudo-model defined through a loss function for fitting local linear models around the cutoff, which gives direct modeling of heterogeneous treatment effects by BART. Optimal selection of the bandwidth parameter for the local model is implemented using the Hyvärinen score. Through numerical experiments, we demonstrate that the proposed approach flexibly captures complicated structures of heterogeneous treatment effects as a function of covariates.


[18] 2603.03828

Philosophical foundations of statistics

The philosophical foundations of statistics involve issues in theoretical statistics, such as goals and methods to meet these goals, and interpretation of the meaning of inference using statistics. They are related to the philosophy of science and to the philosophy of probability. We review the core and partly interrelated themes and place them in context.


[19] 2603.03843

Invariance-Based Dynamic Regret Minimization

We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past data, effectively shrinking the time horizon over which learning can occur. However, in many settings historical data may still carry partial information about the reward model. We propose to leverage such data while adapting to changes, by assuming the reward model decomposes into stationary and non-stationary components. Based on this assumption, we introduce ISD-linUCB, an algorithm that uses past data to learn invariances in the reward model and subsequently exploits them to improve online performance. We show both theoretically and empirically that leveraging invariance reduces the problem dimensionality, yielding significant regret improvements in fast-changing environments when sufficient historical data is available.


[20] 2603.03954

Forecasting of Multiple Seasonal Categorical Time Series Using Fourier Series with Application to AQI Data of Kolkata

Multiple seasonalities have been widely studied in continuous time series using models such as TBATS, for instance in electricity demand forecasting. However, their treatment in categorical time series, such as air quality index (AQI) data, remains limited. Categorical AQI often exhibits distinct seasonal patterns at multiple frequencies, which are not captured by standard models. In this paper, we propose a framework that models multiple seasonalities using Fourier series and indicator functions, inspired by the TBATS methodology. The approach accommodates the ordinal nature of AQI categories while explicitly capturing daily, weekly and yearly seasonal cycles. Simulation studies demonstrate the empirical consistency of parameter estimates under the proposed model. We further illustrate its applicability using real categorical AQI data from Kolkata and compare forecasting performance with Markov models and machine learning methods. Results indicate that our approach effectively captures complex seasonal dynamics and provides improved predictive accuracy. The proposed methodology offers a flexible and interpretable framework for analyzing categorical time series exhibiting multiple seasonal patterns, with potential applications in air quality monitoring, energy consumption and other environmental domains.


[21] 2603.03987

Bayesian structured additive quantile regression for inflated bounded data

Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous distribution on the unit interval with a discrete component to account for zero- and/or one-inflation. In this paper, we propose a class of Bayesian structured additive quantile regression models for inflated bounded continuous data that accommodates zero- and/or one-inflation. The proposed approach enables direct modeling of both the conditional quantiles of the continuous component and the probabilities of observing zeros and/or ones, with structured additive predictors incorporated in both parts, including nonlinear effects, spatial effects, random effects, and varying-coefficient terms. Posterior inference is carried out using Markov chain Monte Carlo algorithms implemented through the software Liesel, a probabilistic programming framework for semiparametric regression. The practical performance of the proposed models is illustrated through simulation studies and two real-data applications: one analyzing the proportion of traffic-related fatalities across Brazilian municipal districts, and another evaluating speech intelligibility in cochlear implant recipients under different experimental conditions.


[22] 2603.04003

Efficient Bayesian Estimation of Dynamic Structural Equation Models via State Space Marginalization

Dynamic structural equation models (DSEMs) combine time-series modeling of within-person processes with hierarchical modeling of between-person differences and differences between timepoints, and have become very popular for the analysis of intensive longitudinal data in the social sciences. An important computational bottleneck has, however, still not been resolved: whenever the underlying process is assumed to be latent and measured by one or more indicators per timepoint, currently published algorithms rely on inefficient brute-force Markov chain Monte Carlo sampling which scales poorly as the number of timepoints and participants increases and results in highly correlated samples. The main result of this paper shows that the within-level part of any DSEM can be reformulated as a linear Gaussian state space model. Consequently, the latent states can be analytically marginalized using a Kalman filter, allowing for highly efficient estimation via Hamiltonian Monte Carlo. This makes estimation of DSEMs computationally tractable for much larger datasets -- both in terms of timepoints and participants -- than what has been previously possible. We demonstrate the proposed algorithm in several simulation experiments, showing it can be orders of magnitude more efficient than standard Metropolis-within-Gibbs approaches.


[23] 2603.04030

On the generalized circular projected Cauchy distribution

\cite{tsagris2025a} proposed the generalized circular projected Cauchy distribution, whose special case is the wrapped Cauchy distribution. In this paper we first derive the relationship with the wrapped Cauchy distribution and we propose a log-likelihood ratio test for the equality of two angular means, without assuming equality of thew concentration parameters. Simulation studies illustrate the performance of the test when one falsely assumes that the true underlying distribution is the wrapped Cauchy distribution.


[24] 2603.04080

Doubly Robust Estimation of Treatment Effects in Staggered Difference-in-Differences with Time-Varying Covariates

The difference-in-differences (DiD) design is a quasi-experimental method for estimating treatment effects. In staggered DiD with multiple treatment groups and periods, estimation based on the two-way fixed effects model yields negative weights when averaging heterogeneous group-period treatment effects into an overall effect. To address this issue, we first define group-period average treatment effects on the treated (ATT), and then define groupwise, periodwise, dynamic, and overall ATTs nonparametrically, so that the estimands are model-free. We propose doubly robust estimators for these types of ATTs in the form of augmented inverse variance weighting (AIVW). The proposed framework allows time-varying covariates that partially explain the time trends in outcomes. Even if part of the working models is misspecified, the proposed estimators still consistently estimate the parameter of interest. The asymptotic variance can be explicitly computed from influence functions. Under a homoskedastic working model, the AIVW estimator is simplified to an augmented inverse probability weighting (AIPW) estimator. We demonstrate the desirable properties of the proposed estimators through simulation and an application that compares the effects of a parallel admission mechanism with immediate admission on the China National College Entrance Examination.


[25] 2603.04133

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.


[26] 2603.04172

The Pivotal Information Criterion

The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $\lambda=\log n$ and $\lambda=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $\lambda$ is selected at the detection boundary (under pure noise). PIC's choice of $\lambda$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.


[27] 2603.04198

Stable and Steerable Sparse Autoencoders with Weight Regularization

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.


[28] 2603.04199

Bayesian Adversarial Privacy

Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces a new quantitative notion of privacy that is both contextual and specific. We argue that it provides a more meaningful notion of privacy than the widely utilised framework of differential privacy and a more explicit and rigorous formulation than what is commonly used in statistical disclosure theory. Our definition relies on concepts inherent to standard Bayesian decision theory, while departing from it in several important respects. In particular, the party controlling the release of sensitive information should make disclosure decisions from the prior viewpoint, rather than conditional on the data, even when the data is itself observed. Illuminating toy examples and computational methods are discussed in high detail in order to highlight the specificities of the method.


[29] 2603.04204

Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means

Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.


[30] 2603.04223

Semi-Supervised Generative Learning via Latent Space Distribution Matching

We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.


[31] 2603.04246

Areal Disaggregation: A Small Area Estimation Perspective

Producing reliable estimates of health and demographic indicators at fine areal scales is crucial for examining heterogeneity and supporting localized health policy. However, many surveys release outcomes only at coarser administrative levels, thereby limiting their relevance for decision-making. We propose a fully Bayesian, single-stage spatial modeling framework for area-level disaggregation that generates fine-scale estimates of indicators directly from coarsely aggregated survey data. By defining a latent spatial process at the target resolution and linking it to observed outcomes through an aggregation step, the framework adopts small-area estimation techniques while incorporating covariates and delivering coherent uncertainty quantification. The proposed methods are implemented with inlabru to achieve computational efficiency. We evaluate performance through a simulation study of general fertility rates in Kenya to demonstrate the models' ability to recover fine-scale variation across diverse data-generating scenarios. We further apply the framework to two national surveys to produce district-level fertility estimates from the 2022 Kenya Demographic and Health Survey and, more importantly, district-level indicators for unpaid care and domestic work and mass media usage from the 2021 Kenya Time Use Survey.


[32] 2603.04252

Cluster-Level Experiments using Temporal Switchback Designs: Precision Gains in Pricing A/B Tests at LATAM Airlines

Experimentation is central to modern digital businesses, but many operational decisions cannot be randomized at the user level. In such cases, cluster-level experiments, where clusters are usually geographic, come to the rescue. However, such experiments often suffer from low power due to persistent cluster heterogeneity, strong seasonality, and autocorrelated outcome metrics, as well as common shocks that move many clusters simultaneously. On an example of airline pricing - where policies are typically applied at the route level and thus the A/B test unit of analysis is a route - we study switchback designs to remedy these problems. In switchback designs, each cluster (route in our case) alternates between treatment and control on a fixed schedule, creating within-route contrasts that mitigate time-invariant heterogeneity and reduce sensitivity to low-frequency noise. We provide a unified Two-Way Fixed Effects interpretation of switchback experiments that makes the identifying variation explicit after partialling out route and time effects, clarifying how switching cadence interacts with temporal dependence to determine precision. Empirically, we evaluate weekly and daily switchback cadences using calibrated synthetic regimes and operational airline data from ancillary pricing. In our evaluations, switchbacks decrease standard errors by up to 67%, with daily switching yielding the largest gains over short horizons and weekly switching offering a strong and simpler-to-operationalize alternative.


[33] 2603.04260

State-dependent marginal emission factors with autoregressive components

Accurate estimation of Marginal Emission Factors (MEFs) is critical for evaluating the decarbonization potential of low-carbon technologies and demand-side management. However, canonical methodologies, predominantly relying on linear regression and differencing techniques, fail to capture the structural non-linearities inherent in the merit order, i.e. the marginal technology setting electricity prices. Utilizing Markov switching autoregressive models with exogenous regressors (MS-ARX) and hourly US data (2019-2025), we identify distinct, mutually exclusive regimes governed by fuel-price dynamics. We find that linear models overestimate abatement potential by masking the dichotomy between a gas-driven and coal-driven marginal system. Furthermore, using robust structural break detection, we link regime instability to a specific structural shift in natural gas pricing in May 2022. Our results indicate that post-2022, the grid has transitioned into a correction phase where the coal-driven regime is less persistent but highly volatile, necessitating state-dependent policy metrics rather than static annual averages.


[34] 2603.04278

Markov-Based Modelling for Reservoir Management: Assessing Reliability and Resilience

This paper develops a comprehensive Markov-based framework for modelling reservoir behaviour and assessing key performance measures such as reliability and resilience. We first formulate a stochastic model for a finite-capacity dam, analysing its long-term storage dynamics under both independent and identically distributed inflows, following the Moran model, and correlated inflows represented by an ergodic Markov chain in the Lloyd formulation. For this finite case, we establish stationary water balance relations and derive asymptotic results, including a central limit theorem for storage levels. The analysis is then extended to an infinite-capacity reservoir, for which normal limit distributions and analogous long-term properties are obtained. A continuous-state formulation is also introduced to represent reservoirs with continuous inflow processes, generalizing the discrete-state framework. On this basis, we define and evaluate reliability and resilience metrics within the proposed Markovian context. The applicability of the methodology is demonstrated through a real-world case study of the Quiebrajano dam, illustrating how the developed models can support efficient and sustainable reservoir management under hydrological uncertainty.


[35] 2603.04286

A mixture model for subtype identification in the context of disease progression modeling

The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.


[36] 2603.04306

Theory Discovery in Social Networks: Automating ERGM Specification with Large Language Models

Understanding how social networks form, whether through reciprocity, shared attributes, or triadic closure, is central to computational social science. Exponential Random Graph Models (ERGMs) offer a principled framework for testing such formation theories, but translating qualitative social hypotheses into stable statistical specifications remains a significant barrier, requiring expertise in both network theory and model estimation. We present Forge (Formation-Oriented Reasoning with Guarded ERGMs), a framework that uses large language models to automate this translation. Given a network and an informal description of the social context, Forge proposes candidate formation mechanisms, validates them against feasibility and stability constraints, and iteratively refines specifications using goodness-of-fit diagnostics. Evaluation across twelve benchmark networks spanning schools, organizations, and online communication shows that Forge converges in 10 of 12 cases, and conditional on convergence it achieves the best likelihood-based fit in 9 of 10 while meeting adequacy thresholds. By combining LLM-based proposals with statistical guardrails, Forge reduces the manual effort required for ERGM specification.


[37] 2603.04315

A spectral inference method for determining the number of communities in networks

To characterize the community structure in network data, researchers have developed various block-type models, including the stochastic block model, the degree-corrected stochastic block model, the mixed membership block model, the degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to the best of our knowledge, existing methods for estimating the number of network communities either rely on explicit model fitting or fail to simultaneously accommodate network sparsity and a diverging number of communities. In this paper, we propose a model-free spectral inference method based on eigengap ratios that addresses these challenges. The inference procedure is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. Technically, we show that the proposed spectral test statistic converges to a {function of the type-I Tracy-Widom distribution via the Airy kernel} under the null hypothesis, and that the test is asymptotically powerful under weak alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.


[38] 2603.04347

Extreme Geometric Quantiles Under Minimal Assumptions, with a Connection to Tukey Depth

Geometric (also known as spatial) quantiles, introduced by Chaudhury and representing one of the three principal approaches to defining multivariate quantiles, have been well studied in the literature. In this work, we focus on the extremal behaviour of these quantiles. We establish new extremal properties, namely general lower and upper bounds for the norm of extreme geometric quantiles, free of any moment conditions. We discuss the impact of such results on the characterization of distribution behaviour. Importantly, the lower bound can be directly linked to univariate quantiles and to halfspace (Tukey) depth central regions, highlighting a novel connection between these two fundamental notions of multivariate quantiles.


[39] 2603.04369

On the singularity of the Fisher Information matrix in the sine-skewed family on the d-dimensional torus

Skewed distributions are fundamental in modelling asymmetric data on the d-dimensional torus. In this context, asymmetry is introduced through the sine-skewing mechanism, which is the only skewing mechanism that has been proposed on the hyper-torus in the literature. Some sine-skewed models are known to suffer from a singular Fisher information matrix in the vicinity of symmetry, which poses a significant issue for inferential purposes. It is an open question to determine for which sine-skewed models Fisher information singularity occurs. In this paper, a general characterization of the class of models that exhibit this singularity is given in the general d-dimensional setting.


[40] 2603.03480

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.


[41] 2603.03507

Solving adversarial examples requires solving exponential misalignment

Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore makes strong predictions: both robust accuracy and distance to any PM should be negatively correlated with the PM dimension. We confirmed these predictions across 18 different networks of varying robust accuracy. Crucially, we find even the most robust networks are still exponentially misaligned, and only the few PMs whose dimensionality approaches that of human concepts exhibit alignment to human perception. Our results connect the fields of alignment and adversarial examples, and suggest the curse of high dimensionality of machine PMs is a major impediment to adversarial robustness.


[42] 2603.03621

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

We develop a rigorous framework for extending neural operators to handle out-of-distribution input functions. We leverage kernel approximation techniques and provide theory for characterizing the input-output function spaces in terms of Reproducing Kernel Hilbert Spaces (RKHSs). We provide theorems on the requirements for reliable extensions and their predicted approximation accuracy. We also establish formal relationships between specific kernel choices and their corresponding Sobolev Native Spaces. This connection further allows the extended neural operators to reliably capture not only function values but also their derivatives. Our methods are empirically validated through the solution of elliptic partial differential equations (PDEs) involving operators on manifolds having point-cloud representations and handling geometric contributions. We report results on key factors impacting the accuracy and computational performance of the extension approaches.


[43] 2603.03673

A Stein Identity for q-Gaussians with Bounded Support

Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.


[44] 2603.03778

Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation

We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's rewards and only observes actions, aims to recover the underlying problem parameters. During the learning process, the learner's behavior naturally transitions from exploration to exploitation, resulting in non-stationary action data that poses significant challenges for the observer. To address this issue, we propose a simple and effective framework called Two-Phase Suffix Imitation. The framework discards data from an initial burn-in phase and performs empirical risk minimization using only data from a subsequent imitation phase. We derive a predictive decision loss bound that explicitly characterizes the bias-variance trade-off induced by the choice of burn-in length. Despite the severe information deficit, we show that a reward-free observer can achieve a convergence rate of $\tilde O(1/\sqrt{N})$, matching the asymptotic efficiency of a fully reward-aware learner. This result demonstrates that a passive observer can effectively uncover the optimal policy from actions alone, attaining performance comparable to that of the learner itself.


[45] 2603.03845

Steady State Distribution and Stability Analysis of Random Differential Equations with Uncertainties and Superpositions: Application to a Predator Prey Model

We present a computational framework to investigate steady state distributions and perform stability analysis for random ordinary differential equations driven by parameter uncertainty. Using the nonlinear Rosenzweig McArthur predator prey model as a case study, we characterize the non-trivial equilibrium steady state of the system and investigate its complex distribution when the parameter probability densities are multi-modal mixture models with partially overlapping or separated components. In consequence, this application includes both, uncertainties and superpositions, of the system parameters. In addition, we present the stability analysis of steady states based on the eigenvalue distribution of the system's Jacobian matrix in this stochastic regime. The steady state posterior density and stability metrics are computed with a recently published Monte Carlo based numerical scheme specifically designed for random equation systems (Hoegele, 2026). Particularly, the simplicity of this stochastic extension of dynamic systems combined with a broadly applicable computational approach is demonstrated. Numerical experiments show the emergence of multi-modal steady state distributions of the predator prey model and we calculate their stability regions, illustrating the method's applicability to uncertainty quantification in dynamical systems.


[46] 2603.03922

Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs

Inverse problems are the task of calibrating models to match data. They play a pivotal role in diverse engineering applications by allowing practitioners to align models with reality. In many applications, engineers and scientists do not have a complete picture of i) the detailed properties of a system (such as material properties, geometry, initial conditions, etc.); ii) the complete laws describing all dynamics at play (such as friction laws, complicated damping phenomena, and general nonlinear interactions). In this paper, we develop a principled methodology for leveraging data from collections of distinct yet related physical systems to jointly estimate the individual model parameters of each system, and learn the shared unknown dynamics in the form of an ML-based closure model. To robustly infer the unknown parameters for each system, we employ a hierarchical Bayesian framework, which allows for the joint inference of multiple systems and their population-level statistics. To learn the closures, we use a maximum marginal likelihood estimate of a neural network embeded within the ODE/PDE formulation of the problem. To realize this framework we utilize the ensemble Metropolis-Adjusted Langevin Algorithm (MALA) for stable and efficient sampling. To mitigate the computational bottleneck of repetitive forward evaluations in solving inverse problems, we introduce a bilevel optimization strategy to simultaneously train a surrogate forward model alongside the inference. Within this framework, we evaluate and compare distinct surrogate architectures, specifically Fourier Neural Operators (FNO) and parametric Physics-Informed Neural Network (PINNs).


[47] 2603.03972

A note on outlier eigenvectors for sparse non-Hermitian perturbations

We consider a sparse i.i.d.\ non-Hermitian random matrix model $X_n$ (with sparsity parameter $K_n$) and a deterministic finite-rank perturbation $E_n$. Assuming biorthogonality for $E_n$ and a growth condition on $K_n$, we outline a finite-rank resolvent reduction leading to asymptotics for the overlap between an outlier eigenvector of $Y_n:=X_n+E_n$ and the corresponding spike eigenspace. In particular, for an outlier spike $\mu$ with $|\mu|>1$, the squared projection of the associated (right) eigenvector onto the spike eigenspace converges in probability to $1-|\mu|^{-2}$. Our result generalizes Theorem 1.6 of [HLN26] to general finite rank case solving Open Problem 5.


[48] 2603.03997

Bandwidth Selection for Spatial HAC Standard Errors

Spatial autocorrelation in regression models can lead to downward biased standard errors and thus incorrect inference. The most common correction in applied economics is the spatial heteroskedasticity and autocorrelation consistent (HAC) standard error estimator introduced by Conley (1999). A critical input is the kernel bandwidth: the distance within which residuals are allowed to be correlated. However, this is still an unresolved problem and there is no formal guidance in the literature. In this paper, I first document that the relationship between the bandwidth and the magnitude of spatial HAC standard errors is inverse-U shaped. This implies that both too narrow and too wide bandwidths lead to underestimated standard errors, contradicting the conventional wisdom that wider bandwidths yield more conservative inference. I then propose a simple, non-parametric, data-driven bandwidth selector based on the empirical covariogram of regression residuals. In extensive Monte Carlo experiments calibrated to empirically relevant spatial correlation structures across the contiguous United States, I show that the proposed method controls the false positive rate at or near the nominal 5% level across a wide range of spatial correlation intensities and sample configurations. I compare six kernel functions and find that the Bartlett and Epanechnikov kernels deliver the best size control. An empirical application using U.S. county-level data illustrates the practical relevance of the method. The R package SpatialInference implements the proposed bandwidth selection method.


[49] 2603.04007

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.


[50] 2603.04109

Testing Full Mediation of Treatment Effects and the Identifiability of Causal Mechanisms

In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal effect of a treatment that is randomly assigned conditional on covariates is fully mediated by, or operates exclusively through, observed intermediate outcomes (referred to as mediators or surrogate outcomes), and (ii) whether the various causal mechanisms operating through different mediators are identifiable conditional on covariates. We demonstrate that if both full mediation and identification of causal mechanisms hold, then the conditionally random treatment is conditionally independent of the outcome given the mediators and covariates. Furthermore, we extend our framework to settings with non-randomly assigned treatments. We show that, in this case, full mediation remains testable, while identification of causal mechanisms is no longer guaranteed. We propose a double machine learning framework for implementing the test that can incorporate high-dimensional covariates and is root-n consistent and asymptotically normal under specific regularity conditions. We also present a simulation study demonstrating good finite-sample performance of our method, along with two empirical applications revisiting randomized experiments on maternal mental health and social norms.


[51] 2603.04275

Statistical Inference for Score Decompositions

We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components: miscalibration, discrimination, and uncertainty. Our estimation and inference relies on a linear recalibration of the forecasts, which is applicable to general multi-step ahead point forecasts such as means and quantiles due to its validity for both smooth and non-smooth scoring functions. This approach ensures desirable finite-sample properties, enables asymptotic inference, and establishes a direct connection to the classical Mincer-Zarnowitz regression. The resulting inference framework facilitates tests for equal forecast calibration or discrimination, which yield three key advantages. They enhance the information content of predictive ability tests by decomposing scores, deliver higher statistical power in certain scenarios, and formally connect scoring-function-based evaluation to traditional calibration tests, such as financial backtests. Applications demonstrate the method's utility. We find that for survey inflation forecasts, discrimination abilities can differ significantly even when overall predictive ability does not. In an application to financial risk models, our tests provide deeper insights into the calibration and information content of volatility and Value-at-Risk forecasts. By disentangling forecast accuracy from backtest performance, the method exposes critical shortcomings in current banking regulation.


[52] 2603.04323

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at this https URL and data at this https URL.


[53] 2603.04365

Comparison theorems for the extreme eigenvalues of a random symmetric matrix

This paper establishes a comparison theorem for the maximum eigenvalue of a sum of independent random symmetric matrices. The theorem states that the maximum eigenvalue of the matrix sum is dominated by the maximum eigenvalue of a Gaussian random matrix that inherits its statistics from the sum, and it strengthens previous results of this type. Corollaries address the minimum eigenvalue and the spectral norm. The comparison methodology is powerful because of the vast arsenal of tools for treating Gaussian random matrices. As applications, the paper improves on existing eigenvalue bounds for random matrices arising in spectral graph theory, quantum information theory, high-dimensional statistics, and numerical linear algebra. In particular, these techniques deliver the first complete proof that a sparse random dimension reduction map has the injectivity properties conjectured by Nelson & Nguyen in 2013.


[54] 2310.09701

A robust and powerful method for assessing replicability of high dimensional data

Identifying signals that replicate across multiple studies is essential for establishing robust scientific evidence, yet existing methods for high-dimensional replicability analysis either rely on restrictive modeling assumptions, are limited to two-study settings, or lack statistical power. We propose a general empirical Bayes framework for multi-study replicability analysis that jointly models summary-level $p$-values while explicitly accounting for between-study heterogeneity. Within each study, non-null $p$-value densities are estimated nonparametrically under monotonicity constraints, enabling flexible and tuning-free inference. For two studies, we develop a local false discovery rate (Lfdr) statistic for the composite null of non-replicability and establish identifiability, consistency, and a cubic-rate convergence of the nonparametric MLE, along with minimax optimality. Extending replicability analysis to $n$ studies typically requires estimating $2^n$ latent configurations, which is computationally infeasible. To address this challenge, we introduce a scalable pairwise rejection strategy that decomposes the exponentially large composite null into disjoint components, yielding linear complexity in the number of studies. We prove asymptotic FDR control under mild regularity conditions and show that Lfdr-based thresholding is power-optimal. Extensive simulations demonstrate that our method provides substantial power gains while maintaining valid FDR control, outperforming state-of-the-art alternatives across a wide range of scenarios. Applying our framework to East Asian- and European-ancestry genome-wide association studies of type 2 diabetes reveals replicable genetic associations that competing approaches fail to detect, illustrating the method's practical utility in large-scale biomedical research.


[55] 2312.05645

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $\Theta\left(\frac{k}{\alpha^2\min \{\varepsilon^2,1\}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq \alpha + 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $\Theta(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.


[56] 2403.15384

Unifying small area estimators based on area-level and unit-level models through calibration

When estimating area means, direct estimators based on area-specific data, are usually consistent under the sampling design without model assumptions. However, they are inefficient if the area sample size is small. In small area estimation, model assumptions linking the areas are used to "borrow strength" from other areas. The basic area-level model provides design-consistent estimators but error variances are assumed to be known. In practice, they are estimated with the (scarce) area-specific data. These estimators are inefficient, and their error is not accounted for in the associated mean squared error estimators. Unit-level models do not require to know the error variances but do not account for the survey design. Here we describe a unified estimator of an area mean that may be obtained both from an area-level model or a unit-level model and based on consistent estimators of the model error variances as the number of areas increases. We propose bootstrap mean squared error estimators that account for the uncertainty due to the estimation of the error variances. We show a better performance of the new small area estimators and our bootstrap estimators of the mean squared error. We apply the results to education data from Colombia.


[57] 2405.20856

Parameter identification in linear non-Gaussian causal models under general confounding

Linear non-Gaussian causal models postulate that each random variable is a linear function of parent variables and non-Gaussian exogenous error terms. We study identification of the linear coefficients when such models contain latent variables. Our focus is on the commonly studied acyclic setting, where each model corresponds to a directed acyclic graph (DAG). For this case, prior literature has demonstrated that connections to overcomplete independent component analysis yield effective criteria to decide parameter identifiability in latent variable models. However, this connection is based on the assumption that the observed variables linearly depend on the latent variables. Departing from this assumption, we treat models that allow for arbitrary non-linear latent confounding. Our main result is a graphical criterion that is necessary and sufficient for deciding the generic identifiability of direct causal effects. Moreover, we provide an algorithmic implementation of the criterion with a run time that is polynomial in the number of observed variables. Finally, we report on estimation heuristics based on the identification result and explore a generalization to models with feedback loops.


[58] 2408.02391

Expected Kullback-Leibler-based characterizations of score-driven updates

Score-driven (SD) models are a standard tool in statistics and econometrics, with applications in hundreds of published articles in the past decade. We provide an information-theoretic characterization of SD updates based on reductions in the expected Kullback-Leibler (EKL) divergence relative to the true -- but unknown -- data-generating density. EKL reductions occur if and only if the expected update direction aligns with the expected score; i.e., their inner product should be positive. This equivalence condition uniquely identifies SD updates (including scaled or clipped variants) as being EKL reducing, even in non-concave, multivariate, and misspecified settings. We further derive explicit bounds on admissible learning rates in terms of score moments, linking SD methods to adaptive optimization techniques. By contrast, alternative performance measures in the literature impose stronger conditions (e.g., concave logarithmic densities) and do not characterize SD updates: other updating rules may improve these measures, while SD updates need not. Our results provide a rigorous justification for SD models and establish EKL as their natural information-theoretic foundation.


[59] 2412.06114

Randomized interventional effects in semicompeting risks

In clinical studies, the risk of the primary (terminal) event may be modified by intermediate events, resulting in semicompeting risks. To study the treatment effect on the terminal event mediated by the intermediate event, researchers wish to decompose the total effect into direct and indirect effects. In this article, we extend the randomized interventional approach to time-to-event outcomes, where both intermediate and terminal events are subject to right censoring. We envision a random draw for the intermediate event process from a reference distribution, either marginally over time-varying confounders or conditionally given the observed history. We present the identification formula for interventional effects. We also discuss some variants of the identification assumptions. We estimate the treatment effects using nonparametric maximum likelihood estimation and propose a sensitivity analysis. We study the effect of matched unrelated donor versus haploidentical donor on death mediated by relapse in a hematopoietic cell transplantation study with graft-versus-host disease (GVHD) as the time-varying confounder. We find that matched unrelated donor transplantation is preferable in terms of survival rates under the use of post-transplantation PTCy GVHD prophylaxis for lymphoma patients.


[60] 2412.19436

Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose a Low-rank Contextual RLHF (LoCo-RLHF) framework that integrates contextual information to better model heterogeneous feedback while maintaining computational efficiency. Our approach builds on a contextual preference model, leveraging the intrinsic low-rank structure of the interaction between user contexts and query-answer pairs to mitigate the high dimensionality of feature representations. Furthermore, we address the challenge of distributional shifts in feedback through our Pessimism in Reduced Subspace (PRS) policy, inspired by pessimistic offline reinforcement learning techniques. We theoretically demonstrate that our policy achieves a tighter sub-optimality gap compared to existing methods. Extensive experiments validate the effectiveness of LoCo-RLHF, showcasing its superior performance in personalized RLHF settings and its robustness to distribution shifts.


[61] 2501.13839

Detecting Sparse Cointegration

We propose a two-step procedure to detect cointegration in high-dimensional settings, focusing on sparse relationships. First, we use the adaptive LASSO to identify the small subset of integrated covariates driving the equilibrium relationship with a target series, ensuring model-selection consistency. Second, we adopt an information-theoretic model choice criterion to distinguish between stationarity and nonstationarity in the resulting residuals, avoiding dependence on asymptotic distributional assumptions. Monte Carlo experiments confirm robust finite-sample performance, even under endogeneity and serial correlation.


[62] 2502.08838

Statistical inference for Levy-driven graph supOU processes: From short- to long-memory in high-dimensional time series

This article introduces Levy-driven graph supOU processes, a parsimonious parametrisation for high-dimensional time series in which dependence between components is governed by a graph structure. Specifically, the model bridges short- and long-range dependence within a single parametric family while accommodating a wide range of marginal distributions. We further develop a generalised method of moments estimator, establish its consistency and asymptotic normality, and assess its finite-sample performance through a simulation study. Finally, we illustrate the practical relevance of our model and estimation method in an empirical study of wind capacity factors in a European electricity network context.


[63] 2504.11279

Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology

The analysis of data from multiple experiments, such as observations of several individuals, is commonly approached using mixed-effects models, which account for variation between individuals through hierarchical representations. This makes mixed-effects models widely applied in fields such as biology, pharmacokinetics, and sociology. In this work, we propose a novel methodology for scalable Bayesian inference in hierarchical mixed-effects models. Our framework first constructs amortized approximations of the likelihood and the posterior distribution, which are then rapidly refined for each individual dataset, to ultimately approximate the parameters posterior across many individuals. The framework is easily trainable, as it uses mixtures of experts but without neural networks, leading to parsimonious yet expressive surrogate models of the likelihood and the posterior. We demonstrate the effectiveness of our methodology using challenging stochastic models, such as mixed-effects stochastic differential equations emerging in systems biology-driven problems. However, the approach is broadly applicable and can accommodate both stochastic and deterministic models. We show that our approach can seamlessly handle inference for many parameters. Additionally, we applied our method to a real-data case study of mRNA transfection. When compared to exact pseudomarginal Bayesian inference, our approach proved to be both fast and competitive in terms of statistical accuracy.


[64] 2505.01297

On identification in ill-posed linear regression

A novel framework is introduced to formalize identifiability in well-specified but ill-posed linear regression models. The framework is distribution-free and accommodates highly correlated features that may or may not relate to the response, reflecting typical real-data structures. First, the identifiable parameter is defined as the least-squares solution obtained by regressing the response on the largest subset of relevant features whose condition number does not exceed a specified threshold, and the relative risk incurred by using this predictor instead of the optimal one is quantified. Second, simple, verifiable conditions are provided under which a broad class of linear dimensionality reduction algorithms can estimate identifiable parameters; algorithms satisfying these conditions are termed statistically interpretable. Third, sharp high-probability error bounds are derived for these algorithms, with rates explicitly reflecting the degree of ill-posedness. With heavy-tailed features and sufficiently low effective rank, these algorithms achieve convergence rates that improve upon both the minimax least-squares rate and lower bounds for sparse estimation under sub-Gaussian features. Results are illustrated via simulations and a real-data application, in which effective rank grows logarithmically with dimension. The framework may extend to algorithms modeling nonlinear response-feature dependence.


[65] 2505.07383

On the relationship between concentration inequalities and maximum bias for depth estimators

The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by contamination. In the multivariate case, Tukey's median was a groundbreaking concept for multivariate location estimation, and its counterpart for scatter matrices has recently attracted considerable interest. The breakdown point and the maximum asymptotic bias are key concepts used to summarize an estimator's behavior under contamination. In the multivariate and regression setting we analyze recently introduced concentration inequalities that provide a unified framework for studying both the statistical convergence rate and robustness of Tukey's median, depth-based scatter matrices and depth-based multivariate regression estimators. We observe that slight variations in these inequalities allow us to visualize the maximum bias behavior of the deepest estimators. We explicitly obtain the maximum bias curve and breakdown point of the deepest scatter matrices. For the location and scale model, we consider two closely related depth formulations, whose deepest estimators display significantly different behavior in terms of breakdown point. A numerical study is performed to compare the finite sample bias performance of several robust estimators in the multivariate setting.


[66] 2505.07669

Separable models for dynamic signed networks

Signed networks capture the polarity of relationships between nodes, providing valuable insights into complex systems where both supportive and antagonistic interactions play a critical role in shaping the network dynamics. We propose a separable temporal generative framework based on multi-layer exponential random graph models, characterised by the assumption of conditional independence between the sign and interaction effects. This structure preserves the flexibly and explanatory power inherent in the binary network specification while adhering to consistent balance theory assumptions. Using a fully probabilistic Bayesian paradigm, we infer the doubly intractable posterior distribution of model parameters via an adaptive Metropolis-Hastings approximate exchange algorithm. We illustrate the interpretability of our model by analysing signed relations among U.S. Senators during Ronald Reagan's second term (1985-1989). Specifically, we aim to understand whether these relations are consistent and balanced or reflect patterns of supportive or antagonistic alliances.


[67] 2505.23783

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performance in classification. While calibration techniques are proposed to mitigate these biases, we show that, in the logit space, many of these methods are equivalent to merely shifting the LLM's decision boundary without having the ability to alter its orientation. This proves inadequate when biases cause the LLM to be severely misaligned. To address these limitations and provide a unifying framework, we propose Supervised Calibration (SC), a loss-minimization-based framework, which learns an optimal, per-class affine transformation of LLM's predictive probabilities in the logit space without requiring external data beyond the context. By using a more expressive functional class, SC not only subsumes many existing calibration methods in ICL as special cases but also enables the ability of altering and even completely reversing the orientation of the LLM's decision boundary. Furthermore, SC's loss-based nature facilitates the seamless integration of two purpose-built regularization techniques, context-invariance and directional trust-region regularizers. The former is designed to tackle the instability issue in ICL, while the latter is to control the degree of calibration. Finally, SC delivers state-of-the-art performance over calibration baselines in the 4-shot, 8-shot, and 16-shot settings across all nine datasets for Mistral-7B-Instruct-v0.3, Llama-2-7B-chat, and Qwen2-7B-Instruct.


[68] 2507.12686

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + \epsilon}$, for any $\epsilon > 0$.


[69] 2509.19956

Multi-state Models For Disease Histories Based On Longitudinal Data

Multi-stage disease histories derived from longitudinal data are becoming increasingly available as registry data and biobanks expand. Multi-state models are suitable to investigate transitions between different disease stages in presence of competing risks. In this context, however, their estimation is complicated by dependent left-truncation, multiple time scales, index event bias, and interval-censoring. In this work, we investigate the extension of piecewise exponential additive models (PAMs) to this setting and their applicability given the above challenges. In simulation studies we show that PAMs can handle dependent left-truncation and accommodate multiple time scales. Compared to a stratified single time scale model, a multiple time scales model is found to be less robust to the data generating process. We also quantify the extent of index event bias in multiple settings, demonstrating its dependence on the completeness of covariate adjustment. In general, PAMs recover baseline and fixed effects well in most settings, except for baseline hazards in interval-censored data. Finally, we apply our framework to estimate multi-state transition hazards and probabilities of chronic kidney disease (CKD) onset and progression in a UK Biobank dataset (n=142,667). We observe CKD progression risk to be highest for individuals with early CKD onset and to further increase over age. In addition, the well-known genetic variant rs77924615 in the UMOD locus is found to be associated with CKD onset hazards, but not with risk of further CKD progression.


[70] 2509.21091

Best-of-$\infty$ -- Asymptotic Performance of Test-Time LLM Ensembling

We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.


[71] 2510.17325

Composite Lp-quantile regression, near quantile regression and the oracle model selection theory

In this paper, we consider high-dimensional Lp-quantile regression which only requires a low order moment of the error and is also a natural generalization of the above methods and Lp-regression as well. The loss function of Lp-quantile regression circumvents the non-differentiability of the absolute loss function and the difficulty of the squares loss function requiring the finiteness of error's variance and thus promises excellent properties of Lp-quantile regression. Specifically, we first develop a new method called composite Lp-quantile regression(CLpQR). We study the oracle model selection theory based on CLpQR (call the estimator CLpQR-oracle) and show in some cases of p CLpQR-oracle behaves better than CQR-oracle (based on composite quantile regression) when error's variance is infinite. Moreover, CLpQR has high efficiency and can be sometimes arbitrarily more efficient than both CQR and the least squares regression. Second, we propose another new regression method,i.e. near quantile regression and prove the asymptotic normality of the estimator when p converges to 1 and the sample size infinity simultaneously. As its applications, a new thought of smoothing quantile objective functions and a new estimation are provided for the asymptotic covariance matrix of quantile regression. Third, we develop a unified efficient algorithm for fitting high-dimensional Lp-quantile regression by combining the cyclic coordinate descent and an augmented proximal gradient algorithm. Remarkably, the algorithm turns out to be a favourable alternative of the commonly used liner programming and interior point algorithm when fitting quantile regression.


[72] 2510.23976

Forecasting Arctic Temperatures with Temporally Dependent Data Using Quantile Gradient Boosting and Adaptive Conformal Prediction Regions

Using data from the Longyearbyen weather station, quantile gradient boosting (``small AI'') is applied to forecast daily 2023 temperatures in Svalbard, Norway. The 0.60 quantile loss weights underestimates about 1.5 times more than overestimates. Predictors include five routinely collected indicators of weather conditions, each lagged by 14~days, yielding temperature forecasts with a two-week lead time. Conformal prediction regions quantify forecasting uncertainty with provably valid coverage. Forecast accuracy is evaluated with attention to local stakeholder concerns, and implications for Arctic adaptation policy are discussed.


[73] 2511.01960

Towards a Unified Framework for Statistical and Mathematical Modeling

Within the biological, physical, and social sciences, there are two broad quantitative traditions: statistical and mathematical modeling. Both traditions have the common pursuit of advancing our scientific knowledge, but these traditions have developed largely independently using distinct languages and inferential frameworks. This paper uses the notion of identification from causal inference, a field originating from the statistical modeling tradition, to develop a shared language. I first review foundational identification results for statistical models and then extend these ideas to mathematical models. Central to this framework is the use of bounds, ranges of plausible numerical values, to analyze both statistical and mathematical models. I discuss the implications of this perspective for the interpretation, comparison, and integration of different modeling approaches, and illustrate the framework with a simple pharmacodynamic model for hypertension. To conclude, I describe areas where the approach taken here should be extended in the future. By formalizing connections between statistical and mathematical modeling, this work contributes to a shared framework for quantitative science. My hope is that this work will advance interactions between these two traditions.


[74] 2511.07340

Smoothing Out Sticking Points: Sampling from Discrete-Continuous Mixtures with Dynamical Monte Carlo by Mapping Discrete Mass into a Latent Universe

Combining a continuous "slab" density with discrete "spike" mass at zero, spike-and-slab priors provide important tools for inducing sparsity and carrying out variable selection in Bayesian models. However, the presence of discrete mass makes posterior inference challenging. "Sticky" extensions to piecewise-deterministic Markov process samplers have shown promising performance, where sampling from the spike is achieved by the process sticking there for an exponentially distributed duration. As it turns out, the sampler remains valid when the exponential sticking time is replaced with its expectation. We justify this by mapping the spike to a continuous density over a latent universe, allowing the sampler to be reinterpreted as traversing this universe while being stuck in the original space. This perspective opens up an array of possibilities to carry out posterior computation under spike-and-slab type priors. Notably, it enables us to construct sticky samplers using other dynamics-based paradigms such as Hamiltonian Monte Carlo; in fact, original sticky process can be established as a partial position-momentum refreshment limit of our Hamiltonian sticky sampler. Our theoretical and empirical findings suggest these alternatives to be at least as efficient as the original sticky approach.


[75] 2511.14827

Implicit Bias of the JKO Scheme

Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $\eta>0$ a sequence of probability distributions $\rho_k^\eta$ that approximate to first order in $\eta$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $\lambda$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $\eta$. We show that $\rho_k^\eta$ are approximated to order $\eta^2$ by Wasserstein gradient flow on a modified energy \[ J^{\eta}(\rho) = J(\rho) - \frac{\eta}{4}\int_M \Big\lVert \nabla_g \frac{\delta J}{\delta \rho} (\rho) \Big\rVert_{2}^{2} \,\rho(dx), \] obtained by subtracting from $J$ the squared metric curvature of $J$ times $\eta/4$. The JKO scheme therefore adds at second order in $\eta$ a deceleration in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{ä}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^\eta$ we study JKO-Flow, Wasserstein gradient flow on $J^\eta$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.


[76] 2601.05217

A complete characterization of testable hypotheses

We revisit a fundamental question in hypothesis testing: given two sets of probability measures $\mathcal{P}$ and $\mathcal{Q}$, when does a nontrivial (i.e. strictly unbiased) test for $\mathcal{P}$ against $\mathcal{Q}$ exist? Le Cam showed that, when $\mathcal{P}$ and $\mathcal{Q}$ have a common dominating measure, a test that has power exceeding its level by more than $\varepsilon$ exists if and only if the convex hulls of $\mathcal{P}$ and $\mathcal{Q}$ are separated in total-variation distance by more than $\varepsilon$. The requirement of a dominating measure is frequently violated in nonparametric statistics. In a passing remark, Le Cam described an approach to address more general scenarios, but he stopped short of stating a formal theorem. This work completes Le Cam's program, by presenting a matching necessary and sufficient condition for testability: for the aforementioned theorem to hold without assumptions, one must take the closures of the convex hulls of $\mathcal{P}$ and $\mathcal{Q}$ in the space of bounded finitely additive measures. We provide simple elucidating examples, and elaborate on various subtle measure theoretic and topological points regarding compactness and achievability.


[77] 2601.12767

Bayesian Variable Selection with the Quasi-Posterior

The Bayesian approach provides powerful methods for variable selection. The ability to incorporate sparsity through prior beliefs and account for parameter uncertainty allows Bayesian variable selection to consistently identify which of the variables are active and exhibit strong finite-sample performance. However, Bayesian methods require the correct specification of full likelihoods for the data, and there is increasing awareness of the problems that model misspecification causes for variable selection. Current approaches to mitigate misspecification either require complex models, detracting from the interpretability of the variable selection task, or move outside rigorous Bayesian uncertainty quantification and provide no recognised method for variable selection. This paper establishes the model quasi-posterior as a principled tool for variable selection. We prove that the model quasi-posterior shares desirable properties of Bayesian variable selection without requiring full likelihood specification. Instead, the quasi-posterior combines a prior with a quasi-likelihood and requires only specification of mean and variance functions, and is therefore robust to other aspects of the data. Marginalising the quasi-likelihood is analytically possible for linear regression, and Laplace approximations are used beyond this to ensure computational tractability. Extensive simulation studies illustrate improved variable selection accuracy across diverse data-generating scenarios when compared with likelihood-based Bayesian variable selection and lasso-penalized methods. We further demonstrate practical relevance through applications to real datasets from social science and genomics.


[78] 2601.16120

Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add

Imbalanced classification often causes standard training procedures to prioritize the majority class and perform poorly on rare but important cases. A classic and widely used remedy is to augment the minority class with synthetic samples, but two basic questions remain under-resolved: when does synthetic augmentation actually help, and how many synthetic samples should be generated? We develop a unified statistical framework for synthetic augmentation in imbalanced learning, studying models trained on imbalanced data augmented with synthetic minority samples. Our theory shows that synthetic data is not always beneficial. In a "local symmetry" regime, imbalance is not the dominant source of error, so adding synthetic samples cannot improve learning rates and can even degrade performance by amplifying generator mismatch. When augmentation can help ("local asymmetry"), the optimal synthetic size depends on generator accuracy and on whether the generator's residual mismatch is directionally aligned with the intrinsic majority-minority shift. This structure can make the best synthetic size deviate from naive full balancing. Practically, we recommend Validation-Tuned Synthetic Size (VTSS): select the synthetic size by minimizing balanced validation loss over a range centered near the fully balanced baseline, while allowing meaningful departures. Extensive simulations and real data analysis further support our findings.


[79] 2601.17217

Transfer learning for functional linear regression via control variates

Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance by leveraging information from related datasets, with the offset TL (O-TL) being a prevailing implementation. In this paper, we adapt the control-variates (CVS) method for TL and develop CVS-based estimators for scalar-on-function regression, one of the most fundamental models in functional data analysis. These estimators rely exclusively on dataset-specific summary statistics, thereby avoiding the pooling of subject-level data and remaining applicable in privacy-restricted or decentralized settings. We establish, for the first time, a theoretical connection between O-TL and CVS-based TL, showing that these two seemingly distinct TL strategies adjust local estimators in fundamentally similar ways. We further derive convergence rates that explicitly account for the unavoidable but typically overlooked smoothing error arising from discretely observed functional predictors, and clarify how similarity among covariance functions across datasets governs the performance of TL. Numerical studies support the theoretical findings and demonstrate that the proposed methods achieve competitive estimation and prediction performance compared with existing alternatives.


[80] 2602.21969

Estimation of the complexity of a network under a Gaussian graphical model

The proportion of edges in a Gaussian graphical model (GGM) characterizes the complexity of its conditional dependence structure. Since edge presence corresponds to a nonzero entry of the precision matrix, estimation of this proportion can be formulated as a large-scale multiple testing problem. We propose an estimator that combines p-values from simultaneous edge-wise tests, conducted under false discovery rate control, with Storey's estimator of the proportion of true null hypotheses. We establish weak dependence conditions on the precision matrix under which the empirical cumulative distribution function of the p-values converges to its population counterpart. These conditions cover high-dimensional regimes, including those arising in genetic association studies. Under such dependence, we characterize the asymptotic bias of the Schweder--Spjøtvoll estimator, showing that it is upward biased and thus slightly underestimates the true edge proportion. Simulation studies across a variety of models confirm accurate recovery of graph complexity.


[81] 2603.01196

A Percentile-Focused Regression Method for Applied Data with Irregular Error Structures

Irregular errors such as heteroscedasticity and nonnormality remain major challenges in linear modeling. These issues often lead to biased inference and unreliable measures of uncertainty. Classical remedies, such as robust standard errors and weighted least squares, only partially address the problem and may fail when heteroscedasticity interacts with skewness or nonlinear mean structures. To address this, we propose a two-stage cumulative distribution function-based (CDF-based) beta regression framework that models the full conditional distribution of the response. The approach first transforms the outcome using a smoothed empirical CDF and then fits a flexible beta regression, allowing heteroscedasticity and nonnormality to be handled naturally through the mean-precision structure of the beta distribution. Predictions are mapped back to the original scale via the empirical quantile function, which preserves interpretability. A comprehensive Monte Carlo study shows that the proposed method consistently achieves good distributional accuracy and well-calibrated prediction intervals compared with OLS, WLS, and GLS. Application to the concrete compressive strength dataset demonstrates its stability and practical advantages.


[82] 2603.02460

Conformal Graph Prediction with Z-Gromov Wasserstein Distances

Supervised graph prediction addresses regression problems where the outputs are structured graphs. Although several approaches exist for graph-valued prediction, principled uncertainty quantification remains limited. We propose a conformal prediction framework for graph-valued outputs, providing distribution-free coverage guarantees in structured output spaces. Our method defines nonconformity via the Z-Gromov-Wasserstein distance, instantiated in practice through Fused Gromov-Wasserstein (FGW), enabling permutation invariant comparison between predicted and candidate graphs. To obtain adaptive prediction sets, we introduce Score Conformalized Quantile Regression (SCQR), an extension of Conformalized Quantile Regression (CQR) to handle complex output spaces such as graph-valued outputs. We evaluate the proposed approach on a synthetic task and a real problem of molecule identification.


[83] 2302.00941

A Robust Multi-Item Auction Design with Statistical Learning

We propose a novel statistical learning method for multi-item auctions that incorporates credible intervals. Our approach employs nonparametric density estimation to estimate credible intervals for bidder types based on historical data. We introduce two new strategies that leverage these credible intervals to reduce the time cost of implementing auctions. The first strategy screens potential winners' value regions within the credible intervals, while the second strategy simplifies the type distribution when the length of the interval is below a threshold value. These strategies are easy to implement and ensure fairness, dominant-strategy incentive compatibility, and dominant-strategy individual rationality with a high probability, while simultaneously reducing implementation costs. We demonstrate the effectiveness of our strategies using the Vickrey-Clarke-Groves mechanism and evaluate their performance through simulation experiments. Our results show that the proposed strategies consistently outperform alternative methods, achieving both revenue maximization and cost reduction objectives.


[84] 2403.10889

List Sample Compression and Uniform Convergence

List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning. Our primary goal is to determine whether classical principles in the PAC setting retain their applicability in the domain of list PAC learning. We focus on uniform convergence (which is the basis of Empirical Risk Minimization) and on sample compression (which is a powerful manifestation of Occam's Razor). In classical PAC learning, both uniform convergence and sample compression satisfy a form of `completeness': whenever a class is learnable, it can also be learned by a learning rule that adheres to these principles. We ask whether the same completeness holds true in the list learning setting. We show that uniform convergence remains equivalent to learnability in the list PAC learning setting. In contrast, our findings reveal surprising results regarding sample compression: we prove that when the label space is $Y=\{0,1,2\}$, then there are 2-list-learnable classes that cannot be compressed. This refutes the list version of the sample compression conjecture by Littlestone and Warmuth (1986). We prove an even stronger impossibility result, showing that there are $2$-list-learnable classes that cannot be compressed even when the reconstructed function can work with lists of arbitrarily large size. We prove a similar result for (1-list) PAC learnable classes when the label space is unbounded. This generalizes a recent result by arXiv:2308.06424.


[85] 2406.14059

Tracking solutions of time-varying variational inequalities

Tracking the solution of time-varying variational inequalities is an important problem with applications in game theory, optimization, and machine learning. Existing work considers time-varying games or time-varying optimization problems. For strongly convex optimization problems or strongly monotone games, these results provide tracking guarantees under the assumption that the variation of the time-varying problem is restrained, that is, problems with a sublinear solution path. In this work we extend existing results in two ways: In our first result, we provide tracking bounds for (1) variational inequalities with a sublinear solution path but not necessarily monotone functions, and (2) for periodic time-varying variational inequalities that do not necessarily have a sublinear solution path-length. Our second main contribution is an extensive study of the convergence behavior and trajectory of discrete dynamical systems of periodic time-varying VI. We show that these systems can exhibit provably chaotic behavior or can converge to the solution. Finally, we illustrate our theoretical results with experiments.


[86] 2409.08773

Heterogeneous Responses to Continuous Treatments: A Cluster-Based Causal Framework

When treatments are non-randomly assigned, continuous, and yield heterogeneous effects at the same intensity, causal identification becomes particularly challenging. In such contexts, existing approaches often fail to provide policy-relevant estimates of the relationship between treatment intensity and outcomes, especially in the presence of limited common support. To fill this gap, we introduce the Clustered Dose-Response Function (Cl-DRF), a novel estimator designed to uncover the continuous causal relationship between treatment intensity and the dependent variable across distinct subgroups. Our approach leverages both theoretical and data-driven sources of heterogeneity, relying on relaxed versions of the conditional independence and positivity assumptions that are plausible across various observational settings. We apply the Cl-DRF estimator to estimate subgroup-specific dose-response relationships between European Cohesion Funds and economic growth. In contrast to much of the literature, higher funding increases growth in more developed regions without diminishing returns, while limited absorptive capacity prevents other regions from fully benefiting.


[87] 2502.05459

DCENWCNet: A Deep CNN Ensemble Network for White Blood Cell Classification with LIME-Based Explainability

White blood cells (WBC) are important parts of our immune system, and they protect our body against infections by eliminating viruses, bacteria, parasites and fungi. The number of WBC types and the total number of WBCs provide important information about our health status. A traditional method, convolutional neural networks (CNN), a deep learning architecture, can classify the blood cell from a part of an object and perform object recognition. Various CNN models exhibit potential; however, their development often involves ad-hoc processes that neglect unnecessary layers, leading to issues with unbalanced datasets and insufficient data augmentation. To address these challenges, we propose a novel ensemble approach that integrates three CNN architectures, each uniquely configured with different dropout and max-pooling layer settings to enhance feature learning. This ensemble model, named DCENWCNet, effectively balances the bias-variance trade-off. When evaluated on the widely recognized Rabbin-WBC dataset, our model outperforms existing state-of-the-art networks, achieving highest mean accuracy. Additionally, it demonstrates superior performance in precision, recall, F1-score, and Area Under the ROC Curve (AUC) across all categories. To delve deeper into the interpretability of classifiers, we employ reliable post-hoc explanation techniques, including Local Interpretable Model-Agnostic Explanations (LIME). These methods approximate the behavior of a black-box model by elucidating the relationships between feature values and predictions. Interpretable results enable users to comprehend and validate the model's predictions, thereby increasing their confidence in the automated diagnosis.


[88] 2503.18012

Scalable physics-informed deep generative model for solving forward and inverse stochastic differential equations

Physics-informed deep learning approaches have been developed to solve forward and inverse stochastic differential equation (SDE) problems with high-dimensional stochastic space. However, the existing deep learning models have difficulties solving SDEs with high-dimensional spatial space. In the present study, we propose a scalable physics-informed deep generative model (sPI-GeM), which is capable of solving SDE problems with both high-dimensional stochastic and spatial space. The sPI-GeM consists of two deep learning models, i.e., (1) physics-informed basis networks (PI-BasisNet), which are used to learn the basis functions as well as the coefficients given data on a certain stochastic process or random field, and (2) physics-informed deep generative model (PI-GeM), which learns the distribution over the coefficients obtained from the PI-BasisNet. The new samples for the learned stochastic process can then be obtained using the inner product between the output of the generator and the basis functions from the trained PI-BasisNet. The sPI-GeM addresses the scalability in the spatial space in a similar way as in the widely used dimensionality reduction technique, i.e., principal component analysis (PCA). A series of numerical experiments, including approximation of Gaussian and non-Gaussian stochastic processes, forward and inverse SDE problems, are performed to demonstrate the accuracy of the proposed model. Furthermore, we also show the scalability of the sPI-GeM in both the stochastic and spatial space using an example of a forward SDE problem with 38- and 20-dimension stochastic and spatial space, respectively.


[89] 2505.15643

Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima

We study best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, focusing on instances with multiple optimal arms. Unlike prior work that addresses the unknown-number-of-optimal-arms case, we consider the setting where the number of optimal arms is known in advance. We derive a new information-theoretic lower bound on the expected sample complexity that leverages this structural knowledge and is strictly tighter than previous bounds. Building on the Track-and-Stop algorithm, we propose a modified, tie-aware stopping rule and prove that it achieves asymptotic instance-optimality, matching the new lower bound. Our results provide the first formal guarantee of optimality for Track-and-Stop in multi-optimal settings with known cardinality, offering both theoretical insights and practical guidance for efficiently identifying any optimal arm.


[90] 2505.18535

Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD

We study the convergence properties and escape dynamics of Stochastic Gradient Descent (SGD) in one-dimensional landscapes, separately considering infinite- and finite-variance noise. Our main focus is to identify the time scales on which SGD reliably moves from an initial point to the local minimum in the same ''basin''. Under suitable conditions on the noise distribution, we prove that SGD converges to the basin's minimum unless the initial point lies too close to a local maximum. In that near-maximum scenario, we show that SGD can linger for a long time in its neighborhood. For initial points near a ''sharp'' maximum, we show that SGD does not remain stuck there, and we provide results to estimate the probability that it will reach each of the two neighboring minima. Overall, our findings present a nuanced view of SGD's transitions between local maxima and minima, influenced by both noise characteristics and the underlying function geometry.


[91] 2506.12112

A Unifying Integral Representation of the Gamma Function and Its Reciprocal

We derive an integral expression $G(z)$ for the reciprocal gamma function, $1/\Gamma(z)=G(z)/\pi$, that is valid for all $z\in\mathbb{C}$, without the need for analytic continuation. The same integral avoids the singularities of the gamma function and satisfies $G(1-z)=\Gamma(z)\sin(\pi z)$ for all $z\in\mathbb{C}$.


[92] 2506.13107

Honesty in Causal Forests: When It Helps and When It Hurts

Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect and model heterogeneity. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 25% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization and its use should be guided by application goals and empirical evaluation, not adopted reflexively.


[93] 2506.13150

Federated ADMM from Bayesian Duality

We propose a new Bayesian approach to generalize the federated Alternating Direction Method of Multipliers (ADMM). We show that the solutions of variational-Bayesian (VB) objectives are associated with a duality structure that not only resembles the structure of ADMM's fixed-points but also generalizes it. For example, ADMM-like updates are recovered when the VB objective is optimized over the isotropic-Gaussian family, and new non-trivial extensions are obtained for other exponential-family distributions. These extensions include a Newton-like variant that converges in one step on quadratic objectives and an Adam-like variant that yields up to 7% accuracy boosts for deep heterogeneous cases. Our work opens a new Bayesian way to generalize ADMM and other primal-dual methods.


[94] 2510.16462

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

This work introduces MAYA, a sequential imitation learning model based on multi-armed bandits, designed to reproduce and predict individual bees' decisions in contextualized foraging tasks. The model accounts for bees' limited memory through a temporal window $\tau$, whose optimal value is around 7 trials, with a slight dependence on weather conditions. Experimental results on real, simulated, and complementary (mice) datasets show that MAYA (particularly with the Wasserstein distance) outperforms imitation baselines and classical statistical models, while providing interpretability of individual learning strategies and enabling the inference of realistic trajectories for prospective ecological applications.


[95] 2510.26303

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

Adam [Kingma & Ba, 2015] is the de facto optimizer in deep learning, yet its theoretical understanding remains limited. Prior analyses show that Adam favors solutions aligned with $\ell_\infty$-geometry, but these results are restricted to the full-batch regime. In this work, we study the implicit bias of incremental Adam (using one sample per step) for logistic regression on linearly separable data, and show that its bias can deviate from the full-batch behavior. As an extreme example, we construct datasets on which incremental Adam provably converges to the $\ell_2$-max-margin classifier, in contrast to the $\ell_\infty$-max-margin bias of full-batch Adam. For general datasets, we characterize its bias using a proxy algorithm for the $\beta_2 \to 1$ limit. This proxy maximizes a data-adaptive Mahalanobis-norm margin, whose associated covariance matrix is determined by a data-dependent dual fixed-point formulation. We further present concrete datasets where this bias reduces to the standard $\ell_2$- and $\ell_\infty$-max-margin classifiers. As a counterpoint, we prove that Signum [Bernstein et al., 2018] converges to the $\ell_\infty$-max-margin classifier for any batch size. Overall, our results highlight that the implicit bias of Adam crucially depends on both the batching scheme and the dataset, while Signum remains invariant.


[96] 2512.00566

Improved Inference for Nonparametric Regression

Nonparametric regression and regression-discontinuity designs suffer from smoothing bias that distorts conventional confidence intervals. Solutions based on robust bias correction (RBC) are now central to the economist's toolbox. In this paper, we establish a novel connection between RBC methods and bootstrap prepivoting. Revisiting RBC through the lens of bootstrapping allows us to develop a novel bias correction procedure which delivers improved nonparametric inference. The resulting confidence intervals are 17% shorter than the usual intervals employed in curve estimation and regression discontinuity designs, without compromising asymptotic coverage. This holds regardless of evaluation point location, bandwidth choice, or regressor and error distribution.


[97] 2512.13506

Learning under Distributional Drift: Prequential Reproducibility as an Intrinsic Statistical Resource

Statistical learning under distributional drift remains poorly characterized, especially in closed-loop settings where learning alters the data-generating law. We introduce an intrinsic drift budget $C_T$ that quantifies the cumulative information-geometric motion of the data distribution along the realized learner-environment trajectory, measured in Fisher-Rao distance (the Riemannian metric induced by Fisher information on a statistical manifold of data-generating laws). The budget decomposes this motion into exogenous change (environmental drift that would occur without intervention) and policy-sensitive feedback contributions (drift induced by the learner's actions through the closed loop). This yields a rate-based characterization: in prequential reproducibility bounds -- where performance on the realized stream is used to predict one-step-ahead performance under the next distribution -- the drift contribution enters through the average drift rate $C_T/T$, i.e., normalized cumulative Fisher-Rao motion per time step. We prove a drift--feedback bound of order $T^{-1/2} + C_T/T$ (up to a controlled second-order remainder) and establish a matching minimax lower bound on a canonical subclass, showing this dependence is tight up to constants. Consequently, when $C_T/T$ is nonnegligible, one-step-ahead reproducibility admits an irreducible accuracy floor of the same order. Finally, the framework places exogenous drift, adaptive data analysis, and performative feedback within a common geometric account of distributional motion.


[98] 2601.03518

Universal concentration for sums under arbitrary dependence

We present a universal concentration bound for sums of random variables under arbitrary dependence, and we prove that it is asymptotically optimal for broad families of marginals admitting a uniform integrable tail-quantile envelope. The bound follows directly from the subadditivity of expected shortfall, a property well known in the risk-measure literature. Our sharpness result relies on an explicit construction of asymptotically extremal couplings. We furthermore provide practical sufficient conditions -- based on convex transformation order comparisons with exponential and power-law envelopes -- under which the bound admits simple, explicit tail profiles.


[99] 2602.08998

Universal Coefficients and Mayer-Vietoris Sequence for Groupoid Homology

We study homology of ample groupoids via the compactly supported Moore complex of the nerve. Let $A$ be a topological abelian group. For $n\ge 0$ set $C_n(\mathcal G;A) := C_c(\mathcal G_n,A)$ and define $\partial_n^A=\sum_{i=0}^n(-1)^i(d_i)_*$. This defines $H_n(\mathcal G;A)$. The theory is functorial for continuous étale homomorphisms. It is compatible with standard reductions, including restriction to saturated clopen subsets. In the ample setting it is invariant under Kakutani equivalence. We reprove Matui type long exact sequences and identify the comparison maps at chain level. For discrete $A$ we prove a natural universal coefficient short exact sequence $$0\to H_n(\mathcal G)\otimes_{\mathbb Z}A\xrightarrow{\ \iota_n^{\mathcal G}\ }H_n(\mathcal G;A)\xrightarrow{\ \kappa_n^{\mathcal G}\ }\operatorname{Tor}_1^{\mathbb Z}\bigl(H_{n-1}(\mathcal G),A\bigr)\to 0.$$ The key input is the chain level isomorphism $C_c(\mathcal G_n,\mathbb Z)\otimes_{\mathbb Z}A\cong C_c(\mathcal G_n,A)$, which reduces the groupoid statement to the classical algebraic UCT for the free complex $C_c(\mathcal G_\bullet,\mathbb Z)$. We also isolate the obstruction for non-discrete coefficients. For a locally compact totally disconnected Hausdorff space $X$ with a basis of compact open sets, the image of $\Phi_X:C_c(X,\mathbb Z)\otimes_{\mathbb Z}A\to C_c(X,A)$ is exactly the compactly supported functions with finite image. Thus $\Phi_X$ is surjective if and only if every $f\in C_c(X,A)$ has finite image, and for suitable $X$ one can produce compactly supported continuous maps $X\to A$ with infinite image. Finally, for a clopen saturated cover $\mathcal G_0=U_1\cup U_2$ we construct a short exact sequence of Moore complexes and derive a Mayer-Vietoris long exact sequence for $H_\bullet(\mathcal G;A)$ for explicit computations.


[100] 2603.01198

Digital Twin-Based Cooling System Optimization for Data Center

Data center cooling systems consume significant auxiliary energy, yet optimization studies rarely quantify the gap between theoretically optimal and operationally deployable control strategies. This paper develops a digital twin of the liquid cooling infrastructure at the Frontier exascale supercomputer, in which a hot-temperature water system comprises three parallel subloops, each serving dedicated coolant distribution unit clusters through plate heat exchangers and variable-speed pumps. The surrogate model is built based on Modelica and validated through one full calendar year of 10-minute operational data following ASHRAE Guideline 14. The model achieves a subloop coefficient of variation of the root mean square error below 2.7% and a normalized mean bias error within 2.5%. Using this validated surrogate model, a layered optimization framework evaluates three progressively constrained strategies: an analytical flow-only optimization achieves 20.4% total energy saving, unconstrained joint optimization of flow rate and supply temperature demonstrates 30.1% total energy saving, and ramp-constrained optimization of flow rate and supply temperature, enforcing actuator rate limits, can reach total energy saving of 27.8%. The analysis reveals that the baseline system operates at 2.9 times the minimum thermally safe flow rate, and the co-optimizing supply temperature with flow rate nearly doubles the savings achievable by flow reduction alone.


[101] 2603.02029

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

Moving beyond evaluations that collapse performance across heterogeneous prompts toward fine-grained evaluation at the prompt level, or within relatively homogeneous subsets, is necessary to diagnose generative models' strengths and weaknesses. Such fine-grained evaluations, however, suffer from a data bottleneck: human gold-standard labels are too costly at this scale, while automated ratings are often misaligned with human judgment. To resolve this challenge, we propose a novel statistical model based on tensor factorization that merges cheap autorater data with a limited set of human gold-standard labels. Specifically, our approach uses autorater scores to pretrain latent representations of prompts and generative models, and then aligns those pretrained representations to human preferences using a small calibration set. This sample-efficient methodology is robust to autorater quality, more accurately predicts human preferences on a per-prompt basis than standard baselines, and provides tight confidence intervals for key statistical parameters of interest. We also showcase the practical utility of our method by constructing granular leaderboards based on prompt qualities and by estimating model performance solely from autorater scores, eliminating the need for additional human annotations.