New articles on Statistics


[1] 2510.05182

Procrustes Problems on Random Matrices

Meaningful comparison between sets of observations often necessitates alignment or registration between them, and the resulting optimization problems range in complexity from those admitting simple closed-form solutions to those requiring advanced and novel techniques. We compare different Procrustes problems in which we align two sets of points after various perturbations by minimizing the norm of the difference between one matrix and an orthogonal transformation of the other. The minimization problem depends significantly on the choice of matrix norm; we highlight recent developments in nonsmooth Riemannian optimization and characterize which choices of norm work best for each perturbation. We show that in several applications, from low-dimensional alignments to hypothesis testing for random networks, when Procrustes alignment with the spectral or robust norm is the appropriate choice, it is often feasible to replace the computationally more expensive spectral and robust minimizers with their closed-form Frobenius-norm counterpart. Our work reinforces the synergy between optimization, geometry, and statistics.


[2] 2510.05268

An efficient hybrid approach of quantile and expectile regression

Quantiles and expectiles are determined by different loss functions: asymmetric least absolute deviation for quantiles and asymmetric squared loss for expectiles. This distinction ensures that quantile regression methods are robust to outliers but somewhat less effective than expectile regression, especially for normally distributed data. However, expectile regression is vulnerable to lack of robustness, especially for heavy-tailed distributions. To address this trade-off between robustness and effectiveness, we propose a novel approach. By introducing a parameter $\gamma$ that ranges between 0 and 1, we combine the aforementioned loss functions, resulting in a hybrid approach of quantiles and expectiles. This fusion leads to the estimation of a new type of location parameter family within the linear regression framework, termed Hybrid of Quantile and Expectile Regression (HQER). The asymptotic properties of the resulting estimaror are then established. Through simulation studies, we compare the asymptotic relative efficiency of the HQER estimator with its competitors, namely the quantile, expectile, and $k$th power expectile regression estimators. Our results show that HQER outperforms its competitors in several simulation scenarios. In addition, we apply HQER to a real dataset to illustrate its practical utility.


[3] 2510.05353

A new composite Mann-Whitney test for two-sample survival comparisons with right-censored data

A fundamental challenge in comparing two survival distributions with right censored data is the selection of an appropriate nonparametric test, as the power of standard tests like the Log rank and Wilcoxon is highly dependent on the often unknown nature of the alternative hypothesis. This paper introduces a new, distribution free two sample test designed to overcome this limitation. The proposed method is based on a strategic decomposition of the data into uncensored and censored subsets, from which a composite test statistic is constructed as the sum of two independent Mann Whitney statistics. This design allows the test to automatically and inherently adapt to various patterns of difference including early, late, and crossing hazards without requiring pre specified parameters, pre testing, or complex weighting schemes. An extensive Monte Carlo simulation study demonstrates that the proposed test robustly maintains the nominal Type I error rate. Crucially, its power is highly competitive with the optimal traditional tests in standard scenarios and superior in complex settings with crossing survival curves, while also exhibiting remarkable robustness to high levels of censoring. The test power effectively approximates the maximum power achievable by either the Log rank or Wilcoxon tests across a wide range of alternatives, offering a powerful, versatile, and computationally simple tool for survival analysis.


[4] 2510.05370

Sparse-Group Factor Analysis for High-Dimensional Time Series

Factor analysis is a widely used technique for dimension reduction in high-dimensional data. However, a key challenge in factor models lies in the interpretability of the latent factors. One intuitive way to interpret these factors is through their associated loadings. Liu and Wang proposed a novel framework that redefines factor models with sparse loadings to enhance interpretability. In many high-dimensional time series applications, variables exhibit natural group structures. Building on this idea, our paper incorporates domain knowledge and prior information by modeling both individual sparsity and group sparsity in the loading matrix. This dual-sparsity framework further improves the interpretability of the estimated factors. We develop an algorithm to estimate both the loading matrix and the common component, and we establish the asymptotic properties of the resulting estimators. Simulation studies demonstrate the strong performance of the proposed method, and a real-data application illustrates how incorporating prior knowledge leads to more interpretable results.


[5] 2510.05380

Minima and Critical Points of the Bethe Free Energy Are Invariant Under Deformation Retractions of Factor Graphs

In graphical models, factor graphs, and more generally energy-based models, the interactions between variables are encoded by a graph, a hypergraph, or, in the most general case, a partially ordered set (poset). Inference on such probabilistic models cannot be performed exactly due to cycles in the underlying structures of interaction. Instead, one resorts to approximate variational inference by optimizing the Bethe free energy. Critical points of the Bethe free energy correspond to fixed points of the associated Belief Propagation algorithm. A full characterization of these critical points for general graphs, hypergraphs, and posets with a finite number of variables is still an open problem. We show that, for hypergraphs and posets with chains of length at most 1, changing the poset of interactions of the probabilistic model to one with the same homotopy type induces a bijection between the critical points of the associated free energy. This result extends and unifies classical results that assume specific forms of collapsibility to prove uniqueness of the critical points of the Bethe free energy.


[6] 2510.05440

Refereed Learning

We initiate an investigation of learning tasks in a setting where the learner is given access to two competing provers, only one of which is honest. Specifically, we consider the power of such learners in assessing purported properties of opaque models. Following prior work that considers the power of competing provers in different settings, we call this setting refereed learning. After formulating a general definition of refereed learning tasks, we show refereed learning protocols that obtain a level of accuracy that far exceeds what is obtainable at comparable cost without provers, or even with a single prover. We concentrate on the task of choosing the better one out of two black-box models, with respect to some ground truth. While we consider a range of parameters, perhaps our most notable result is in the high-precision range: For all $\varepsilon>0$ and ambient dimension $d$, our learner makes only one query to the ground truth function, communicates only $(1+\frac{1}{\varepsilon^2})\cdot\text{poly}(d)$ bits with the provers, and outputs a model whose loss is within a multiplicative factor of $(1+\varepsilon)$ of the best model's loss. Obtaining comparable loss with a single prover would require the learner to access the ground truth at almost all of the points in the domain. To obtain this bound, we develop a technique that allows the learner to sample, using the provers, from a distribution that is not efficiently samplable to begin with. We find this technique to be of independent interest. We also present lower bounds that demonstrate the optimality of our protocols in a number of respects, including prover complexity, number of samples, and need for query access.


[7] 2510.05447

A Probabilistic Basis for Low-Rank Matrix Learning

Low rank inference on matrices is widely conducted by optimizing a cost function augmented with a penalty proportional to the nuclear norm $\Vert \cdot \Vert_*$. However, despite the assortment of computational methods for such problems, there is a surprising lack of understanding of the underlying probability distributions being referred to. In this article, we study the distribution with density $f(X)\propto e^{-\lambda\Vert X\Vert_*}$, finding many of its fundamental attributes to be analytically tractable via differential geometry. We use these facts to design an improved MCMC algorithm for low rank Bayesian inference as well as to learn the penalty parameter $\lambda$, obviating the need for hyperparameter tuning when this is difficult or impossible. Finally, we deploy these to improve the accuracy and efficiency of low rank Bayesian matrix denoising and completion algorithms in numerical experiments.


[8] 2510.05545

Can language models boost the power of randomized experiments without statistical bias?

Randomized experiments or randomized controlled trials (RCTs) are gold standards for causal inference, yet cost and sample-size constraints limit power. Meanwhile, modern RCTs routinely collect rich, unstructured data that are highly prognostic of outcomes but rarely used in causal analyses. We introduce CALM (Causal Analysis leveraging Language Models), a statistical framework that integrates large language models (LLMs) predictions with established causal estimators to increase precision while preserving statistical validity. CALM treats LLM outputs as auxiliary prognostic information and corrects their potential bias via a heterogeneous calibration step that residualizes and optimally reweights predictions. We prove that CALM remains consistent even when LLM predictions are biased and achieves efficiency gains over augmented inverse probability weighting estimators for various causal effects. In particular, CALM develops a few-shot variant that aggregates predictions across randomly sampled demonstration sets. The resulting U-statistic-like predictor restores i.i.d. structure and also mitigates prompt-selection variability. Empirically, in simulations calibrated to a mobile-app depression RCT, CALM delivers lower variance relative to other benchmarking methods, is effective in zero- and few-shot settings, and remains stable across prompt designs. By principled use of LLMs to harness unstructured data and external knowledge learned during pretraining, CALM provides a practical path to more precise causal analyses in RCTs.


[9] 2510.05566

Domain-Shift-Aware Conformal Prediction for Large Language Models

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.


[10] 2510.05568

Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes

Methods for solving scientific computing and inference problems, such as kernel- and neural network-based approaches for partial differential equations (PDEs), inverse problems, and supervised learning tasks, depend crucially on the choice of hyperparameters. Specifically, the efficacy of such methods, and in particular their accuracy, stability, and generalization properties, strongly depends on the choice of hyperparameters. While bilevel optimization offers a principled framework for hyperparameter tuning, its nested optimization structure can be computationally demanding, especially in PDE-constrained contexts. In this paper, we propose an efficient strategy for hyperparameter optimization within the bilevel framework by employing a Gauss-Newton linearization of the inner optimization step. Our approach provides closed-form updates, eliminating the need for repeated costly PDE solves. As a result, each iteration of the outer loop reduces to a single linearized PDE solve, followed by explicit gradient-based hyperparameter updates. We demonstrate the effectiveness of the proposed method through Gaussian process models applied to nonlinear PDEs and to PDE inverse problems. Extensive numerical experiments highlight substantial improvements in accuracy and robustness compared to conventional random hyperparameter initialization. In particular, experiments with additive kernels and neural network-parameterized deep kernels demonstrate the method's scalability and effectiveness for high-dimensional hyperparameter optimization.


[11] 2510.05573

On the Theory of Continual Learning with Gradient Descent for Neural Networks

Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters with orthogonal means. Our results obtain bounds on the rate of forgetting during train and test-time in terms of the number of iterations, the sample size, the number of tasks, and the hidden-layer size. Our results reveal interesting phenomena on the role of different problem parameters in the rate of forgetting. Numerical experiments across diverse setups confirm our results, demonstrating their validity beyond the analyzed settings.


[12] 2510.05645

Weak convergence of Bayes estimators under general loss functions

We investigate the asymptotic behavior of parametric Bayes estimators under a broad class of loss functions that extend beyond the classical translation-invariant setting. To this end, we develop a unified theoretical framework for loss functions exhibiting locally polynomial structure. This general theory encompasses important examples such as the squared Wasserstein distance, the Sinkhorn divergence and Stein discrepancies, which have gained prominence in modern statistical inference and machine learning. Building on the classical Bernstein--von Mises theorem, we establish sufficient conditions under which Bayes estimators inherit the posterior's asymptotic normality. As a by-product, we also derive conditions for the differentiability of Wasserstein-induced loss functions and provide new consistency results for Bayes estimators. Several examples and numerical experiments demonstrate the relevance and accuracy of the proposed methodology.


[13] 2510.05646

Geographically Weighted Regression for Air Quality Low-Cost Sensor Calibration

This article focuses on the use of Geographically Weighted Regression (GWR) method to correct air quality low-cost sensors measurements. Those sensors are of major interest in the current era of high-resolution air quality monitoring at urban scale, but require calibration using reference analyzers. The results for NO2 are provided along with comments on the estimated GWR model and the spatial content of the estimated coefficients. The study has been carried out using the publicly available SensEURCity dataset in Antwerp, which is especially relevant since it includes 9 reference stations and 34 micro-sensors collocated and deployed within the city.


[14] 2510.05680

A Bivariate DAR($1$) model for ordinal time series

We present a bivariate vector valued discrete autoregressive model of order $1$ (BDAR($1$)) for discrete time series. The BDAR($1$) model assumes that each time series follows its own univariate DAR($1$) model with dependent random mechanisms that determine from which component the current status occurs and dependent innovations. The joint distribution of the random mechanisms which are expressed by Bernoulli vectors are proposed to be defined through copulas. The same holds for the joint distribution of innovation terms. Properties of the model are provided, while special focus is given to the case of bivariate ordinal time series. A simulation study is presented, indicating that model provides efficient estimates even in case of moderate sample size. Finally, a real data application on unemployment state of two countries is presented, for illustrating the proposed model.


[15] 2510.05685

Sample complexity for entropic optimal transport with radial cost

We prove a new sample complexity result for entropy regularized optimal transport. Our bound holds for probability measures on $\mathbb R^d$ with exponential tail decay and for radial cost functions that satisfy a local Lipschitz condition. It is sharp up to logarithmic factors, and captures the intrinsic dimension of the marginal distributions through a generalized covering number of their supports. Examples that fit into our framework include subexponential and subgaussian distributions and radial cost functions $c(x,y)=|x-y|^p$ for $p\ge 2.$


[16] 2510.05800

Bridging the Gap Between Methodological Research and Statistical Practice: Toward "Translational Simulation Research

Simulations are valuable tools for empirically evaluating the properties of statistical methods and are primarily employed in methodological research to draw general conclusions about methods. In addition, they can often be useful to applied statisticians, who may rely on published simulation results to select an appropriate statistical method for their application. However, on the one hand, applying published simulation results directly to practical settings is frequently challenging, as the scenarios considered in methodological studies rarely align closely enough with the characteristics of specific real-world applications to be truly informative. Applied statisticians, on the other hand, may struggle to construct their own simulations or to adapt methodological research to better reflect their specific data due to time constraints and limited programming expertise. We propose bridging this gap between methodological research and statistical practice through a translational approach by developing dedicated software along with simulation studies, which should abstract away the coding-intensive aspects of running simulations while still offering sufficient flexibility in parameter selection to meet the needs of applied statisticians. We demonstrate this approach using two practical examples, illustrating that the concept of translational simulation can be implemented in practice in different ways. In the first example - simulation-based evaluation of power in two-arm randomized clinical trials with an ordinal endpoint - the solution we discuss is a Shiny web application providing a graphical user interface for running informative simulations in this context. For the second example - assessing the impact of measurement error in multivariable regression - a less labor-intensive approach is suggested, involving the provision of user-friendly, well-structured, and modular analysis code.


[17] 2510.05857

Missing Data Imputation in the Context of Propensity Score Analysis: A Systematic Review

Missing data is a common challenge in observational studies. Another challenge stems from the observational nature of the study itself. Here, propensity score analysis can be used as a technique to replicate conditions similar to those found in clinical trials. With regard to the missing data, a majority of studies only analyze the complete cases, but this has several pitfalls. In this review, we investigate which methods are used for the handling of missing data in the context of propensity score analyses. Therefore, we searched PubMed for the keywords propensity score and missing data, restricting our search to the time between January 2010 and February 2024. The PRISMA statement was followed in this review. A total of 147 articles were included in the analyses. A major finding of this study is that although the usage of multiple imputation (MI) has risen over time, only a limited number of studies describe the mechanism of missing data and the details of the MI algorithm. Keywords Missing data, Propensity Score, Observational Data, Multiple Imputation, Systematic Review


[18] 2510.05861

Extension of Wald-Wolfowitz Runs Test for Regression Validity Testing with Repeated Measures of Independent Variable

The Wald-Wolfowitz runs test can assess the correctness of a regression curve fitted to a data set with one independent parameter. The assessment is performed through examination of the residuals, where the signs of the residuals would appear randomly if the regression curve were correct. We propose extending the test to the case where multiple data points were measured for specific independent parameter values. By randomly permutating the data points corresponding to each independent parameter value and treating their residuals as occurring in their permutated sequence and then executing the runs test, results are shown to be equivalent to those of a data set containing the same number of points with no repeated measurements. This approach avoids the loss of points, and hence loss of test sensitivity, were the means at each independent parameter value used. It also avoids the problem of weighting each mean differently if the number of data points measured at each parameter value is not identical.


[19] 2510.05902

A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial consideration is how to select an informative subset based on the queries posed by the data analyst. A recently proposed method for this purpose involves randomly selecting samples from the large dataset based on subsampling probabilities. However, a major drawback of this approach is that the derived subsampling probabilities are typically based on an assumed statistical model which may be difficult to correctly specify in practice. To address this limitation, we propose to determine subsampling probabilities based on a statistical model that we acknowledge may be misspecified. To do so, we propose to evaluate the subsampling probabilities based on the Mean Squared Error (MSE) of the predictions from a model that is not assumed to completely describe the large dataset. We apply our subsampling approach in a simulation study and for the analysis of two real-world large datasets, where its performance is benchmarked against existing subsampling techniques. The findings suggest that there is value in adopting our approach over current practice.


[20] 2510.05960

Copula-Based Clustering of Financial Time Series via Evidence Accumulation

Understanding the dependence structure of asset returns is fundamental in risk assessment and is particularly relevant in a portfolio diversification strategy. We propose a clustering approach where evidence accumulated in a multiplicity of classifications is achieved using classical hierarchical procedures and multiple copula-based dissimilarity measures. Assets that are grouped in the same cluster are such that their stochastic behavior is similar during risky scenarios, and riskaverse investors could exploit this information to build a risk-diversified portfolio. An empirical demonstration of such a strategy is presented by using data from the EURO STOXX 50 index.


[21] 2510.06034

Measures of Dependence based on Wasserstein distances

Measuring dependence between random variables is a fundamental problem in Statistics, with applications across diverse fields. While classical measures such as Pearson's correlation have been widely used for over a century, they have notable limitations, particularly in capturing nonlinear relationships and extending to general metric spaces. In recent years, the theory of Optimal Transport and Wasserstein distances has provided new tools to define measures of dependence that generalize beyond Euclidean settings. This survey explores recent proposals, outlining two main approaches: one based on the distance between the joint distribution and the product of marginals, and another leveraging conditional distributions. We discuss key properties, including characterization of independence, normalization, invariances, robustness, sample, and computational complexity. Additionally, we propose an alternative perspective that measures deviation from maximal dependence rather than independence, leading to new insights and potential extensions. Our work highlights recent advances in the field and suggests directions for further research in the measurement of dependence using Optimal Transport.


[22] 2510.06051

Automated Gating for Flow Cytometry Data Using a Kernel-Smoothed EM Algorithm

Phytoplankton are microscopic algae responsible for roughly half of the world's photosynthesis that play a critical role in global carbon cycles and oxygen production, and measuring the abundance of their subtypes across a wide range of spatiotemporal scales is of great relevance to oceanography. High-frequency flow cytometry is a powerful technique in which oceanographers at sea can rapidly record the optical properties of tens of thousands of individual phytoplankton cells every few minutes. Identifying distinct subpopulations within these vast datasets (a process known as "gating") remains a major challenge and has largely been performed manually so far. In this paper, we introduce a fast, automated gating method, which accurately identifies phytoplankton populations by fitting a time-evolving mixture of Gaussians model using an expectation-maximization-like algorithm with kernel smoothing. We use simulated data to demonstrate the validity and robustness of this approach, and use oceanographic cruise data to highlight the method's ability to not only replicate but surpass expert manual gating. Finally, we provide the flowkernel R package, written in literate programming, that implements the algorithm efficiently.


[23] 2510.06055

Construction of optimal tests for symmetry on the torus and their quantitative error bounds

In this paper, we develop optimal tests for symmetry on the hyper-dimensional torus, leveraging Le Cam's methodology. We address both scenarios where the center of symmetry is known and where it is unknown. These tests are not only valid under a given parametric hypothesis but also under a very broad class of symmetric distributions. The asymptotic behavior of the proposed tests is studied both under the null hypothesis and local alternatives, and we derive quantitative bounds on the distributional distance between the exact (unknown) distribution of the test statistic and its asymptotic counterpart using Stein's method. The finite-sample performance of the tests is evaluated through simulation studies, and their practical utility is demonstrated via an application to protein folding data. Additionally, we establish a broadly applicable result on the quadratic mean differentiability of functions, a key property underpinning the use of Le Cam's approach.


[24] 2510.06121

Measuring Data Quality for Project Lighthouse

In this paper, we first situate the challenges for measuring data quality under Project Lighthouse in the broader academic context. We then discuss in detail the three core data quality metrics we use for measurement--two of which extend prior academic work. Using those data quality metrics as examples, we propose a framework, based on machine learning classification, for empirically justifying the choice of data quality metrics and their associated minimum thresholds. Finally we outline how these methods enable us to rigorously meet the principle of data minimization when analyzing potential experience gaps under Project Lighthouse, which we term quantitative data minimization.


[25] 2510.06132

Optimal sub-Gaussian variance proxy for 3-mass distributions

We investigate the problem of characterizing the optimal variance proxy for sub-Gaussian random variables,whose moment-generating function exhibits bounded growth at infinity. We apply a general characterization method to discrete random variables with equally spaced atoms. We thoroughly study 3-mass distributions, thereby generalizing the well-studied Bernoulli case. We also prove that the discrete uniform distribution over $N$ points is strictly sub-Gaussian. Finally, we provide an open-source Python package that combines analytical and numerical approaches to compute optimal sub-Gaussian variance proxies across a wide range of distributions.


[26] 2510.06136

Geometric Model Selection for Latent Space Network Models: Hypothesis Testing via Multidimensional Scaling and Resampling Techniques

Latent space models assume that network ties are more likely between nodes that are closer together in an underlying latent space. Euclidean space is a popular choice for the underlying geometry, but hyperbolic geometry can mimic more realistic patterns of ties in complex networks. To identify the underlying geometry, past research has applied non-Euclidean extensions of multidimensional scaling (MDS) to the observed geodesic distances: the shortest path lengths between nodes. The difference in stress, a standard goodness-of-fit metric for MDS, across the geometries is then used to select a latent geometry with superior model fit (lower stress). The effectiveness of this method is assessed through simulations of latent space networks in Euclidean and hyperbolic geometries. To better account for uncertainty, we extend permutation-based hypothesis tests for MDS to the latent network setting. However, these tests do not incorporate any network structure. We propose a parametric bootstrap distribution of networks, conditioned on observed geodesic distances and the Gaussian Latent Position Model (GLPM). Our method extends the Davidson-MacKinnon J-test to latent space network models with differing latent geometries. We pay particular attention to large and sparse networks, and both the permutation test and the bootstrapping methods show an improvement in detecting the underlying geometry.


[27] 2510.06149

Implicit Updates for Average-Reward Temporal Difference Learning

Temporal difference (TD) learning is a cornerstone of reinforcement learning. In the average-reward setting, standard TD($\lambda$) is highly sensitive to the choice of step-size and thus requires careful tuning to maintain numerical stability. We introduce average-reward implicit TD($\lambda$), which employs an implicit fixed point update to provide data-adaptive stabilization while preserving the per iteration computational complexity of standard average-reward TD($\lambda$). In contrast to prior finite-time analyses of average-reward TD($\lambda$), which impose restrictive step-size conditions, we establish finite-time error bounds for the implicit variant under substantially weaker step-size requirements. Empirically, average-reward implicit TD($\lambda$) operates reliably over a much broader range of step-sizes and exhibits markedly improved numerical stability. This enables more efficient policy evaluation and policy learning, highlighting its effectiveness as a robust alternative to average-reward TD($\lambda$).


[28] 2510.06157

A GNAR-Based Framework for Spectral Estimation of Network Time Series: Application to Global Bank Network Connectedness

Patterns of dependence in financial networks, such as global bank connectedness, evolve over time and across frequencies. Analysing these systems requires statistical tools that jointly capture temporal dynamics and the underlying network topology. This work develops a novel spectral analysis framework for Generalized Network Autoregressive (GNAR) processes, modeling dependencies beyond direct neighbours by incorporating r-stage neighbourhood effects, unlike existing methods that at best rely solely on adjacency-based interactions. We define the GNAR spectral density and related quantities, such as coherence and partial coherence, for which we propose both parametric and network-penalized nonparametric estimators. Extensive simulations demonstrate the strong performance of the parametric spectral estimator, as also backed up by theoretical arguments. The proposed framework has wide applications, and here we focus on the analysis of global bank network connectedness. The findings illustrate how the GNAR spectral quantities effectively capture the frequency-specific cross-nodal dependencies, thus yielding estimates consistent with established measures, while also uncovering richer temporal and structural patterns of volatility transmission.


[29] 2510.06177

Power-divergence copulas: A new class of Archimedean copulas, with an insurance application

This paper demonstrates that, under a particular convention, the convex functions that characterise the phi divergences also generate Archimedean copulas in at least two dimensions. As a special case, we develop the family of Archimedean copulas associated with the important family of power divergences, which we call the power-divergence copulas. The properties of the family are extensively studied, including the subfamilies that are absolutely continuous or have a singular component, the ordering of the family, limiting cases (i.e., the Frechet-Hoeffding lower bound and Frechet-Hoeffding upper bound), the Kendall's tau and tail-dependence coefficients, and cases that extend to three or more dimensions. In an illustrative application, the power-divergence copulas are used to model a Danish fire insurance dataset. It is shown that the power-divergence copulas provide an adequate fit to the bivariate distribution of two kinds of fire-related losses claimed by businesses, while several benchmarks (a suite of well known Archimedean, extreme-value, and elliptical copulas) do not.


[30] 2510.06191

Rapid calibration of atrial electrophysiology models using Gaussian process emulators in the ensemble Kalman filter

Atrial fibrillation (AF) is a common cardiac arrhythmia characterised by disordered electrical activity in the atria. The standard treatment is catheter ablation, which is invasive and irreversible. Recent advances in computational electrophysiology offer the potential for patient-specific models, often referred to as digital twins, that can be used to guide clinical decisions. To be of practical value, we must be able to rapidly calibrate physics-based models using routine clinical measurements. We pose this calibration task as a static inverse problem, where the goal is to infer tissue-level electrophysiological parameters from the available observations. To make this tractable, we replace the expensive forward model with Gaussian process emulators (GPEs), and propose a novel adaptation of the ensemble Kalman filter (EnKF) for static non-linear inverse problems. The approach yields parameter samples that can be interpreted as coming from the best Gaussian approximation of the posterior distribution. We compare our results with those obtained using Markov chain Monte Carlo (MCMC) sampling and demonstrate the potential of the approach to enable near-real-time patient-specific calibration, a key step towards predicting outcomes of AF treatment within clinical timescales. The approach is readily applicable to a wide range of static inverse problems in science and engineering.


[31] 2510.06210

Geographical inequalities in mortality by age and gender in Italy, 2002-2019: insights from a spatial extension of the Lee-Carter model

Italy reports some of the lowest levels of mortality in the developed world. Recent evidence, however, suggests that even in low mortality countries improvements may be slowing and regional inequalities widening. This study contributes new empirical evidence to the debate by analysing mortality data by single year of age for males and females across 107 provinces in Italy from 2002 to 2019. We extend the widely used Lee Carter model to include spatially varying age specific effects, and further specify it to capture space age time interactions. The model is estimated in a Bayesian framework using the inlabru package, which builds on INLA (Integrated Nested Laplace Approximation) for non linear models and facilitates the use of smoothing priors. This approach borrows strength across provinces and years, mitigating random fluctuations in small area death counts. Results demonstrate the value of such a granular approach, highlighting the existence of an uneven geography of mortality despite overall national improvements. Mortality disadvantage is concentrated in parts of the Centre South and North West, while the Centre North and North East fare relatively better. These geographical differences have widened since 2010, with clear age and gender specific patterns, being more pronounced at younger adult ages for men and at older adult ages for women. Future work may involve refining the analysis to mortality by cause of death or socioeconomic status, informing more targeted public health policies to address mortality disparities across Italy's provinces.


[32] 2510.06211

Tensor time series change-point detection in cryptocurrency network data

Financial fraud has been growing exponentially in recent years. The rise of cryptocurrencies as an investment asset has simultaneously seen a parallel growth in cryptocurrency scams. To detect possible cryptocurrency fraud, and in particular market manipulation, previous research focused on the detection of changes in the network of trades; however, market manipulators are now trading across multiple cryptocurrency platforms, making their detection more difficult. Hence, it is important to consider the identification of changes across several trading networks or a `network of networks' over time. To this end, in this article, we propose a new change-point detection method in the network structure of tensor-variate data. This new method, labeled TenSeg, first employs a tensor decomposition, and second detects multiple change-points in the second-order (cross-covariance or network) structure of the decomposed data. It allows for change-point detection in the presence of frequent changes of possibly small magnitudes and is computationally fast. We apply our method to several simulated datasets and to a cryptocurrency dataset, which consists of network tensor-variate data from the Ethereum blockchain. We demonstrate that our approach substantially outperforms other state-of-the-art change-point techniques, and the detected change-points in the Ethereum data set coincide with changes across several trading networks or a `network of networks' over time. Finally, all the relevant \textsf{R} code implementing the method in the article are available on this https URL.


[33] 2510.05147

Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing

Ensuring reliability in modern software systems requires rigorous pre-production testing across highly heterogeneous and evolving environments. Because exhaustive evaluation is infeasible, practitioners must decide how to allocate limited testing resources across configurations where failure probabilities may drift over time. Existing combinatorial optimization approaches are static, ad hoc, and poorly suited to such non-stationary settings. We introduce a novel reinforcement learning (RL) framework that recasts configuration allocation as a sequential decision-making problem. Our method is the first to integrate Q-learning with a hybrid reward design that fuses simulated outcomes and real-time feedback, enabling both sample efficiency and robustness. In addition, we develop an adaptive online-offline training scheme that allows the agent to quickly track abrupt probability shifts while maintaining long-run stability. Extensive simulation studies demonstrate that our approach consistently outperforms static and optimization-based baselines, approaching oracle performance. This work establishes RL as a powerful new paradigm for adaptive configuration allocation, advancing beyond traditional methods and offering broad applicability to dynamic testing and resource scheduling domains.


[34] 2510.05183

Aneurysm Growth Time Series Reconstruction Using Physics-informed Autoencoder

Arterial aneurysm (Fig.1) is a bulb-shape local expansion of human arteries, the rupture of which is a leading cause of morbidity and mortality in US. Therefore, the prediction of arterial aneurysm rupture is of great significance for aneurysm management and treatment selection. The prediction of aneurysm rupture depends on the analysis of the time series of aneurysm growth history. However, due to the long time scale of aneurysm growth, the time series of aneurysm growth is not always accessible. We here proposed a method to reconstruct the aneurysm growth time series directly from patient parameters. The prediction is based on data pairs of [patient parameters, patient aneurysm growth time history]. To obtain the mapping from patient parameters to patient aneurysm growth time history, we first apply autoencoder to obtain a compact representation of the time series for each patient. Then a mapping is learned from patient parameters to the corresponding compact representation of time series via a five-layer neural network. Moving average and convolutional output layer are implemented to explicitly taking account the time dependency of the time series. Apart from that, we also propose to use prior knowledge about the mechanism of aneurysm growth to improve the time series reconstruction results. The prior physics-based knowledge is incorporated as constraints for the optimization problem associated with autoencoder. The model can handle both algebraic and differential constraints. Our results show that including physical model information about the data will not significantly improve the time series reconstruction results if the training data is error-free. However, in the case of training data with noise and bias error, incorporating physical model constraints can significantly improve the predicted time series.


[35] 2510.05197

Efficient Prediction of Pass@k Scaling in Large Language Models

Assessing the capabilities and risks of frontier AI systems is a critical area of research, and recent work has shown that repeated sampling from models can dramatically increase both. For instance, repeated sampling has been shown to increase their capabilities, such as solving difficult math and coding problems, but it has also been shown to increase their potential for harm, such as being jailbroken. Such results raise a crucial question for both capability and safety forecasting: how can one accurately predict a model's behavior when scaled to a massive number of attempts, given a vastly smaller sampling budget? This question is directly relevant to model providers, who serve hundreds of millions of users daily, and to governmental regulators, who seek to prevent harms. To answer this questions, we make three contributions. First, we find that standard methods for fitting these laws suffer from statistical shortcomings that hinder predictive accuracy, especially in data-limited scenarios. Second, we remedy these shortcomings by introducing a robust estimation framework, which uses a beta-binomial distribution to generate more accurate predictions from limited data. Third, we propose a dynamic sampling strategy that allocates a greater budget to harder problems. Combined, these innovations enable more reliable prediction of rare risks and capabilities at a fraction of the computational cost.


[36] 2510.05231

Hadamard ranks of algebraic varieties

Motivated by the study of decompositions of tensors as Hadamard products (i.e., coefficient-wise products) of low-rank tensors, we introduce the notion of Hadamard rank of a given point with respect to a projective variety: if it exists, it is the smallest number of points in the variety such that the given point is equal to their Hadamard product. We prove that if the variety $X$ is not contained in a coordinate hyperplane or a binomial hypersurface, then the generic point has a finite $X$-Hadamard-rank. Although the Hadamard rank might not be well defined for special points, we prove that the general Hadamard rank with respect to secant varieties of toric varieties is finite and the maximum Hadamard rank for points with no coordinates equal to zero is at most twice the generic rank. In particular, we focus on Hadamard ranks with respect to secant varieties of toric varieties since they provide a geometric framework in which Hadamard decompositions of tensors can be interpreted. Finally, we give a lower bound to the dimension of Hadamard products of secant varieties of toric varieties: this allows us to deduce the general Hadamard rank with respect to secant varieties of several Segre-Veronese varieties.


[37] 2510.05286

Computing frustration and near-monotonicity in deep neural networks

For the signed graph associated to a deep neural network, one can compute the frustration level, i.e., test how close or distant the graph is to structural balance. For all the pretrained deep convolutional neural networks we consider, we find that the frustration is always less than expected from null models. From a statistical physics point of view, and in particular in reference to an Ising spin glass model, the reduced frustration indicates that the amount of disorder encoded in the network is less than in the null models. From a functional point of view, low frustration (i.e., proximity to structural balance) means that the function representing the network behaves near-monotonically, i.e., more similarly to a monotone function than in the null models. Evidence of near-monotonic behavior along the partial order determined by frustration is observed for all networks we consider. This confirms that the class of deep convolutional neural networks tends to have a more ordered behavior than expected from null models, and suggests a novel form of implicit regularization.


[38] 2510.05329

Tensor-on-tensor Regression Neural Networks for Process Modeling with High-dimensional Data

Modern sensing and metrology systems now stream terabytes of heterogeneous, high-dimensional (HD) data profiles, images, and dense point clouds, whose natural representation is multi-way tensors. Understanding such data requires regression models that preserve tensor geometry, yet remain expressive enough to capture the pronounced nonlinear interactions that dominate many industrial and mechanical processes. Existing tensor-based regressors meet the first requirement but remain essentially linear. Conversely, conventional neural networks offer nonlinearity only after flattening, thereby discarding spatial structure and incurring prohibitive parameter counts. This paper introduces a Tensor-on-Tensor Regression Neural Network (TRNN) that unifies these two paradigms.


[39] 2510.05338

Integrating Bayesian methods with neural network--based model predictive control: a review

In this review, we assess the use of Bayesian methods in model predictive control (MPC), focusing on neural-network-based modeling, control design, and uncertainty quantification. We systematically analyze individual studies and how they are implemented in practice. While Bayesian approaches are increasingly adopted to capture and propagate uncertainty in MPC, reported gains in performance and robustness remain fragmented, with inconsistent baselines and limited reliability analyses. We therefore argue for standardized benchmarks, ablation studies, and transparent reporting to rigorously determine the effectiveness of Bayesian techniques for MPC.


[40] 2510.05446

Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs

We study meta-reinforcement learning in finite-horizon MDPs where related tasks share similar structures in their optimal action-value functions. Specifically, we posit a linear representation $Q^*_h(s,a)=\Phi_h(s,a)\,\theta^{(k)}_h$ and place a Gaussian meta-prior $ \mathcal{N}(\theta^*_h,\Sigma^*_h)$ over the task-specific parameters $\theta^{(k)}_h$. Building on randomized value functions, we propose two Thompson-style algorithms: (i) MTSRL, which learns only the prior mean and performs posterior sampling with the learned mean and known covariance; and (ii) $\text{MTSRL}^{+}$, which additionally estimates the covariance and employs prior widening to control finite-sample estimation error. Further, we develop a prior-alignment technique that couples the posterior under the learned prior with a meta-oracle that knows the true prior, yielding meta-regret guarantees: we match prior-independent Thompson sampling in the small-task regime and strictly improve with more tasks once the prior is learned. Concretely, for known covariance we obtain $\tilde{O}(H^{4}S^{3/2}\sqrt{ANK})$ meta-regret, and with learned covariance $\tilde{O}(H^{4}S^{3/2}\sqrt{AN^3K})$; both recover a better behavior than prior-independent after $K \gtrsim \tilde{O}(H^2)$ and $K \gtrsim \tilde{O}(N^2H^2)$, respectively. Simulations on a stateful recommendation environment (with feature and prior misspecification) show that after brief exploration, MTSRL/MTSRL\(^+\) track the meta-oracle and substantially outperform prior-independent RL and bandit-only meta-baselines. Our results give the first meta-regret guarantees for Thompson-style RL with learned Q-priors, and provide practical recipes (warm-start via RLSVI, OLS aggregation, covariance widening) for experiment-rich settings.


[41] 2510.05454

Estimating Treatment Effects Under Bounded Heterogeneity

Researchers often use specifications that correctly estimate the average treatment effect under the assumption of constant effects. When treatment effects are heterogeneous, however, such specifications generally fail to recover this average effect. Augmenting these specifications with interaction terms between demeaned covariates and treatment eliminates this bias, but often leads to imprecise estimates and becomes infeasible under limited overlap. We propose a generalized ridge regression estimator, $\texttt{regulaTE}$, that penalizes the coefficients on the interaction terms to achieve an optimal trade-off between worst-case bias and variance in estimating the average effect under limited treatment effect heterogeneity. Building on this estimator, we construct confidence intervals that remain valid under limited overlap and can also be used to assess sensitivity to violations of the constant effects assumption. We illustrate the method in empirical applications under unconfoundedness and staggered adoption, providing a practical approach to inference under limited overlap.


[42] 2510.05487

Smart Contract Adoption under Discrete Overdispersed Demand: A Negative Binomial Optimization Perspective

Effective supply chain management under high-variance demand requires models that jointly address demand uncertainty and digital contracting adoption. Existing research often simplifies demand variability or treats adoption as an exogenous decision, limiting relevance in e-commerce and humanitarian logistics. This study develops an optimization framework combining dynamic Negative Binomial (NB) demand modeling with endogenous smart contract adoption. The NB process incorporates autoregressive dynamics in success probability to capture overdispersion and temporal correlation. Simulation experiments using four real-world datasets, including Delhivery Logistics and the SCMS Global Health Delivery system, apply maximum likelihood estimation and grid search to optimize adoption intensity and order quantity. Across all datasets, the NB specification outperforms Poisson and Gaussian benchmarks, with overdispersion indices exceeding 1.5. Forecasting comparisons show that while ARIMA and Exponential Smoothing achieve similar point accuracy, the NB model provides superior stability under high variance. Scenario analysis reveals that when dispersion exceeds a critical threshold (r > 6), increasing smart contract adoption above 70% significantly enhances profitability and service levels. This framework offers actionable guidance for balancing inventory costs, service levels, and implementation expenses, highlighting the importance of aligning digital adoption strategies with empirically observed demand volatility.


[43] 2510.05527

Transfer Learning on Edge Connecting Probability Estimation under Graphon Model

Graphon models provide a flexible nonparametric framework for estimating latent connectivity probabilities in networks, enabling a range of downstream applications such as link prediction and data augmentation. However, accurate graphon estimation typically requires a large graph, whereas in practice, one often only observes a small-sized network. One approach to addressing this issue is to adopt a transfer learning framework, which aims to improve estimation in a small target graph by leveraging structural information from a larger, related source graph. In this paper, we propose a novel method, namely GTRANS, a transfer learning framework that integrates neighborhood smoothing and Gromov-Wasserstein optimal transport to align and transfer structural patterns between graphs. To prevent negative transfer, GTRANS includes an adaptive debiasing mechanism that identifies and corrects for target-specific deviations via residual smoothing. We provide theoretical guarantees on the stability of the estimated alignment matrix and demonstrate the effectiveness of GTRANS in improving the accuracy of target graph estimation through extensive synthetic and real data experiments. These improvements translate directly to enhanced performance in downstream applications, such as the graph classification task and the link prediction task.


[44] 2510.05548

Decade-long Emission Forecasting with an Ensemble Model in Taiwan

Taiwan's high population and heavy dependence on fossil fuels have led to severe air pollution, with the most prevalent greenhouse gas being carbon dioxide (CO2). There-fore, this study presents a reproducible and comprehensive case study comparing 21 of the most commonly employed time series models in forecasting emissions, analyzing both univariate and multivariate approaches. Among these, Feedforward Neural Network (FFNN), Support Vector Machine (SVM), and Random Forest Regressor (RFR) achieved the best performances. To further enhance robustness, the top performers were integrated with Linear Regression through a custom stacked generalization en-semble technique. Our proposed ensemble model achieved an SMAPE of 1.407 with no signs of overfitting. Finally, this research provides an accurate decade-long emission projection that will assist policymakers in making more data-driven decisions.


[45] 2510.05620

Monte Carlo-Type Neural Operator for Differential Equations

The Monte Carlo-type Neural Operator (MCNO) introduces a framework for learning solution operators of one-dimensional partial differential equations (PDEs) by directly learning the kernel function and approximating the associated integral operator using a Monte Carlo-type approach. Unlike Fourier Neural Operators (FNOs), which rely on spectral representations and assume translation-invariant kernels, MCNO makes no such assumptions. The kernel is represented as a learnable tensor over sampled input-output pairs, and sampling is performed once, uniformly at random from a discretized grid. This design enables generalization across multiple grid resolutions without relying on fixed global basis functions or repeated sampling during training, while an interpolation step maps between arbitrary input and output grids to further enhance flexibility. Experiments on standard 1D PDE benchmarks show that MCNO achieves competitive accuracy with efficient computational cost. We also provide a theoretical analysis proving that the Monte Carlo estimator yields a bounded bias and variance under mild regularity assumptions. This result holds in any spatial dimension, suggesting that MCNO may extend naturally beyond one-dimensional problems. More broadly, this work explores how Monte Carlo-type integration can be incorporated into neural operator frameworks for continuous-domain PDEs, providing a theoretically supported alternative to spectral methods (such as FNO) and to graph-based Monte Carlo approaches (such as the Graph Kernel Neural Operator, GNO).


[46] 2510.05716

A Note on "Quasi-Maximum-Likelihood Estimation in Conditionally Heteroscedastic Time Series: A Stochastic Recurrence Equations Approach"

Bougerol (1993) and Straumann and Mikosch (2006) gave conditions under which there exists a unique stationary and ergodic solution to the stochastic difference equation $Y_t \overset{a.s.}{=} \Phi_t (Y_{t-1}), t \in \mathbb{Z}$ where $(\Phi_t)_{t \in \mathbb{Z}}$ is a sequence of stationary and ergodic random Lipschitz continuous functions from $(Y,|| \cdot ||)$ to $(Y,|| \cdot ||)$ where $(Y,|| \cdot ||)$ is a complete subspace of a real or complex separable Banach space. In the case where $(Y,|| \cdot ||)$ is a real or complex separable Banach space, Straumann and Mikosch (2006) also gave conditions under which any solution to the stochastic difference equation $\hat{Y}_t \overset{a.s.}{=} \hat{\Phi}_t (\hat{Y}_{t-1}), t \in \mathbb{N}$ with $\hat{Y}_0$ given where $(\hat{\Phi}_t)_{t \in \mathbb{N}}$ is only a sequence of random Lipschitz continuous functions from $(Y,|| \cdot ||)$ to $(Y,|| \cdot ||)$ satisfies $\gamma^t || \hat{Y}_t - Y_t || \overset{a.s.}{\rightarrow} 0$ as $t \rightarrow \infty$ for some $\gamma > 1$. In this note, we give slightly different conditions under which this continues to hold in the case where $(Y,|| \cdot ||)$ is only a complete subspace of a real or complex separable Banach space by using close to identical arguments as Straumann and Mikosch (2006).


[47] 2510.05739

A Universal Moments-Only Bound for Cumulants

We establish a simple, universal inequality that bounds the $n$-th cumulant of a real-valued random variable using only its $n$-th (absolute or central) moment. Specifically, for any integer $n \ge 1$, the $n$-th cumulant $\kappa_n(X)$ satisfies \[ \lvert \kappa_n(X) \rvert \;\le\; C_n\, \mathbb{E}\lvert X-\mathbb{E}X\rvert^{\,n}, \] with an alternative bound in terms of $\mathbb{E}\lvert X\rvert^{\,n}$ in the uncentered form. The coefficient $C_n$ is derived from the combinatorial structure of the moment--cumulant formula and exhibits the asymptotic behavior $C_n \sim (n-1)!/\rho^{\,n}$, giving an exponential improvement over classical bounds that grow on the order of $n^n$. In full generality, the bound involves the ordered Bell numbers, corresponding to a rate parameter $\rho=\ln 2\approx 0.693$. For $n\ge 2$, shift-invariance of cumulants yields a universal centered refinement with parameter $\rho_0\approx 1.146$, determined by $e^{\rho_0}=2+\rho_0$. For symmetric random variables, the bound sharpens further to $\rho_{\mathrm{sym}}=\operatorname{arcosh}2\approx 1.317$. These results extend naturally to the multivariate setting, providing uniform control of joint cumulants under the same minimal moment assumptions.


[48] 2510.05809

Coherent estimation of risk measures

We develop a statistical framework for risk estimation, inspired by the axiomatic theory of risk measures. Coherent risk estimators -- functionals of P&L samples inheriting the economic properties of risk measures -- are defined and characterized through robust representations linked to $L$-estimators. The framework provides a canonical methodology for constructing estimators with sound financial and statistical properties, unifying risk measure theory, principles for capital adequacy, and practical statistical challenges in market risk. A numerical study illustrates the approach, focusing on expected shortfall estimation under both i.i.d. and overlapping samples relevant for regulatory FRTB model applications.


[49] 2510.05825

Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling

Inference-Time Scaling (ITS) improves language models by allocating more computation at generation time. Particle Filtering (PF) has emerged as a strong ITS method for complex mathematical reasoning tasks, but it is vulnerable when guided by process reward models, which often assign overconfident scores early in the reasoning process. This causes PF to suffer from premature exploitation: it myopically commits to locally promising trajectories, prunes potentially correct hypotheses, and converges to suboptimal solutions. This failure mode, known as particle impoverishment, is especially severe under constrained computational budgets. To address this, we analyze the problem and identify two root causes: a lack of diversity in the particle set due to overconfident resampling and consequent inability to assess the potential of a reasoning path. We introduce Entropic Particle Filtering (ePF), an algorithm that integrates two new techniques to solve these issues. The first technique, Entropic Annealing (EA), directly mitigates particle impoverishment by monitoring search diversity via entropy; when diversity drops, it intervenes by dynamically annealing the resampling distribution to preserve exploration. The second, an enhancement called Look-ahead Modulation (LaM), adds a predictive guide to evaluate a state's potential based on its successors. On several challenging math benchmarks, ePF significantly outperforms strong baselines and achieves up to a 50 % relative improvement in task reward. Together, these methods improve PF's resilience by balancing the exploration of diverse solution spaces with the exploitation of high-reward regions, ultimately leading to higher-quality solutions.


[50] 2510.05849

ESS-Flow: Training-free guidance of flow-based models as inference in source space

Guiding pretrained flow-based generative models for conditional generation or to produce samples with desired target properties enables solving diverse tasks without retraining on paired data. We present ESS-Flow, a gradient-free method that leverages the typically Gaussian prior of the source distribution in flow-based models to perform Bayesian inference directly in the source space using Elliptical Slice Sampling. ESS-Flow only requires forward passes through the generative model and observation process, no gradient or Jacobian computations, and is applicable even when gradients are unreliable or unavailable, such as with simulation-based observations or quantization in the generation or observation process. We demonstrate its effectiveness on designing materials with desired target properties and predicting protein structures from sparse inter-residue distance measurements.


[51] 2510.05944

Minimal Unimodal Decomposition is NP-Hard on Graphs

A function on a topological space is called unimodal if all of its super-level sets are contractible. A minimal unimodal decomposition of a function $f$ is the smallest number of unimodal functions that sum up to $f$. The problem of decomposing a given density function into its minimal unimodal components is fundamental in topological statistics. We show that finding a minimal unimodal decomposition of an edge-linear function on a graph is NP-hard. Given any $k \geq 2$, we establish the NP-hardness of finding a unimodal decomposition consisting of $k$ unimodal functions. We also extend the NP-hardness result to related variants of the problem, including restriction to planar graphs, inapproximability results, and generalizations to higher dimensions.


[52] 2510.05949

Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density

Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box. JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representation. While (ii) is often considered as an obvious remedy to representation collapse, we uncover that JEPAs' anti-collapse term does much more--it provably estimates the data density. In short, any successfully trained JEPA can be used to get sample probabilities, e.g., for data curation, outlier detection, or simply for density estimation. Our theoretical finding is agnostic of the dataset and architecture used--in any case one can compute the learned probabilities of sample $x$ efficiently and in closed-form using the model's Jacobian matrix at $x$. Our findings are empirically validated across datasets (synthetic, controlled, and Imagenet) and across different Self Supervised Learning methods falling under the JEPA family (I-JEPA and DINOv2) and on multimodal models, such as MetaCLIP. We denote the method extracting the JEPA learned density as {\bf JEPA-SCORE}.


[53] 2510.05991

Robust Inference for Convex Pairwise Difference Estimators

This paper develops distribution theory and bootstrap-based inference methods for a broad class of convex pairwise difference estimators. These estimators minimize a kernel-weighted convex-in-parameter function over observation pairs that are similar in terms of certain covariates, where the similarity is governed by a localization (bandwidth) parameter. While classical results establish asymptotic normality under restrictive bandwidth conditions, we show that valid Gaussian and bootstrap-based inference remains possible under substantially weaker assumptions. First, we extend the theory of small bandwidth asymptotics to convex pairwise estimation settings, deriving robust Gaussian approximations even when a smaller than standard bandwidth is used. Second, we employ a debiasing procedure based on generalized jackknifing to enable inference with larger bandwidths, while preserving convexity of the objective function. Third, we construct a novel bootstrap method that adjusts for bandwidth-induced variance distortions, yielding valid inference across a wide range of bandwidth choices. Our proposed inference method enjoys demonstrable more robustness, while retaining the practical appeal of convex pairwise difference estimators.


[54] 2510.06025

Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers

Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.


[55] 2510.06028

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The bounds are stable under approximation with Langevin Monte Carlo algorithms. Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data and correctly upper bound the test error for random labels. Our method indicates that generalization in the low-temperature, interpolation regime is already signaled by small training errors in the more classical high temperature regime.


[56] 2510.06091

Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method

Mixtures of linear dynamical systems (MoLDS) provide a path to model time-series data that exhibit diverse temporal dynamics across trajectories. However, its application remains challenging in complex and noisy settings, limiting its effectiveness for neural data analysis. Tensor-based moment methods can provide global identifiability guarantees for MoLDS, but their performance degrades under noise and complexity. Commonly used expectation-maximization (EM) methods offer flexibility in fitting latent models but are highly sensitive to initialization and prone to poor local minima. Here, we propose a tensor-based method that provides identifiability guarantees for learning MoLDS, which is followed by EM updates to combine the strengths of both approaches. The novelty in our approach lies in the construction of moment tensors using the input-output data to recover globally consistent estimates of mixture weights and system parameters. These estimates can then be refined through a Kalman EM algorithm, with closed-form updates for all LDS parameters. We validate our framework on synthetic benchmarks and real-world datasets. On synthetic data, the proposed Tensor-EM method achieves more reliable recovery and improved robustness compared to either pure tensor or randomly initialized EM methods. We then analyze neural recordings from the primate somatosensory cortex while a non-human primate performs reaches in different directions. Our method successfully models and clusters different conditions as separate subsystems, consistent with supervised single-LDS fits for each condition. Finally, we apply this approach to another neural dataset where monkeys perform a sequential reaching task. These results demonstrate that MoLDS provides an effective framework for modeling complex neural data, and that Tensor-EM is a reliable approach to MoLDS learning for these applications.


[57] 2510.06106

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning

Deep neural networks have achieved remarkable success, yet our understanding of how they learn remains limited. These models can learn high-dimensional tasks, which is generally statistically intractable due to the curse of dimensionality. This apparent paradox suggests that learnable data must have an underlying latent structure. What is the nature of this structure? How do neural networks encode and exploit it, and how does it quantitatively impact performance - for instance, how does generalization improve with the number of training examples? This thesis addresses these questions by studying the roles of locality and compositionality in data, tasks, and deep learning representations.


[58] 2510.06122

PolyGraph Discrepancy: a classifier-based metric for graph generation

Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at this https URL.


[59] 2510.06165

Higher-Order Feature Attribution: Bridging Statistics, Explainable AI, and Topological Signal Processing

Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but becomes less direct when the predictive model involves interactions such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.


[60] 2510.06181

Conformalized Gaussian processes for online uncertainty quantification over graphs

Uncertainty quantification (UQ) over graphs arises in a number of safety-critical applications in network science. The Gaussian process (GP), as a classical Bayesian framework for UQ, has been developed to handle graph-structured data by devising topology-aware kernel functions. However, such GP-based approaches are limited not only by the prohibitive computational complexity, but also the strict modeling assumptions that might yield poor coverage, especially with labels arriving on the fly. To effect scalability, we devise a novel graph-aware parametric GP model by leveraging the random feature (RF)-based kernel approximation, which is amenable to efficient recursive Bayesian model updates. To further allow for adaptivity, an ensemble of graph-aware RF-based scalable GPs have been leveraged, with per-GP weight adapted to data arriving incrementally. To ensure valid coverage with robustness to model mis-specification, we wed the GP-based set predictors with the online conformal prediction framework, which post-processes the prediction sets using adaptive thresholds. Experimental results the proposed method yields improved coverage and efficient prediction sets over existing baselines by adaptively ensembling the GP models and setting the key threshold parameters in CP.


[61] 2106.02270

Estimating on-street parking occupancy using smart meter data

The excessive search for parking, known as cruising, generates pollution and congestion. Cities are looking for approaches that will reduce the negative impact associated with searching for parking. However, adequately measuring the number of vehicles in search of parking is difficult and requires sensing technologies. In this paper, we develop an approach that eliminates the need for sensing technology by using parking meter payment transactions to estimate parking occupancy and the number of cars searching for parking. The estimation scheme is based on Particle Markov Chain Monte Carlo. We validate the performance of the Particle Markov Chain Monte Carlo approach using data simulated from a GI/GI/s queue. We show that the approach generates asymptotically unbiased Bayesian estimates of the parking occupancy and underlying model parameters such as arrival rates, average parking time, and the payment compliance rate. Finally, we estimate parking occupancy and cruising using parking meter data from SFpark, a large scale parking experiment and subsequently, compare the Particle Markov Chain Monte Carlo parking occupancy estimates against the ground truth data from the parking sensors. Our approach is easily replicated and scalable given that it only requires using data that cities already possess, namely historical parking payment transactions.


[62] 2307.12472

Model-free generalized fiducial inference

Conformal prediction (CP) was developed to provide finite-sample probabilistic prediction guarantees. While CP algorithms are a relatively general-purpose approach to uncertainty quantification, with finite-sample guarantees, they lack versatility. Namely, the CP approach does not {\em prescribe} how to quantify the degree to which a data set provides evidence in support of (or against) an arbitrary event from a general class of events. In this paper, tools are offered from imprecise probability theory to build a formal connection between CP and generalized fiducial (GF) inference. These new insights establish a more general inferential lens from which CP can be understood, and demonstrate the pragmatism of fiducial ideas. The formal connection establishes a context in which epistemically-derived GF probability matches aleatoric/frequentist probability. Beyond this fact, it is illustrated how tools from imprecise probability theory, namely lower and upper probability functions, can be applied in the context of the imprecise GF distribution to provide posterior-like, prescriptive inference that is not possible within the CP framework alone. In addition to the primary CP generalization that is contributed, fundamental connections are synthesized between this new model-free GF and three other areas of contemporary research: nonparametric predictive inference (NPI), conformal predictive systems/distributions, and inferential models (IMs).


[63] 2311.03497

Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and by Sector

To assess the impact of climate change on the Canadian economy, we investigate the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We also provide projections of climate change impacts up to the year of 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that rising Winter temperature anomalies have a notable adverse impact on Canadian economic growth. Province-wide, Quebec, Manitoba, and Ontario are anticipated to experience larger negative impacts, whereas British Columbia is less vulnerable. Industry-wide, Finance and Real Estate, Science and Technology, and Information, Culture and Recreation are consistently projected to see mild benefits, while adverse effects are predicted for Manufacturing, Agriculture, and Mining. The disparities of climate change effects between provinces and industries highlight the need for governments to tailor their policies accordingly, and offer targeted assistance to regions and industries that are particularly vulnerable in the face of climate change. Targeted approaches to climate change mitigation are likely to be more effective than one-size-fits-all policies for the whole economy.


[64] 2406.11584

The analysis of paired comparison data in the presence of cyclicality and intransitivity

A principled approach to cyclicality and intransitivity in paired comparison data is developed. The proposed methodology enables more precise estimation of the underlying preference profile and facilitates the identification of all cyclic patterns and potential intransitivities. Consequently, it improves upon existing methods for ranking and prediction, including enhanced performance in betting and wagering systems. Fundamental to our development is a detailed understanding and study of the parameter space that accommodates cyclicality and intransitivity. It is shown that identifying cyclicality and intransitivity reduces to a model selection problem, and a new method for model selection employing geometrical insights, unique to the problem at hand, is proposed. The large sample properties of the estimators and guarantees on the selected model are provided. Thus, it is shown that in large samples all cyclical relations and consequent intransitivities can be identified. The method is exemplified using simulations and analysis of an illustrative example.


[65] 2406.16523

YEAST: Yet Another Sequential Test

Online evaluation of machine learning models is typically conducted through A/B experiments. Sequential statistical tests are valuable tools for analysing these experiments, as they enable researchers to stop data collection early without increasing the risk of false discoveries. However, existing sequential tests either limit the number of interim analyses or suffer from low statistical power. In this paper, we introduce a novel sequential test designed for continuous monitoring of A/B experiments. We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. Our method is derived using a new technique that inverts a bound on the probability of threshold crossing, based on a classical maximal inequality.


[66] 2407.03389

A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data

In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The proposed approach extends the Information Bottleneck principle to heterogeneous data through generalised product kernels, integrating continuous, nominal, and ordinal variables within a unified optimization framework. We address the following challenges: developing a systematic bandwidth selection strategy that equalises contributions across variable types, and proposing an adaptive hyperparameter updating scheme that ensures a valid solution into a predetermined number of potentially imbalanced clusters. Through simulations on 28,800 synthetic data sets and ten publicly available benchmarks, we demonstrate that the proposed method, named DIBmix, achieves superior performance compared to four established methods (KAMILA, K-Prototypes, FAMD with K-Means, and PAM with Gower's dissimilarity). Results show DIBmix particularly excels when clusters exhibit size imbalances, data contain low or moderate cluster overlap, and categorical and continuous variables are equally represented. The method presents a significant advantage over traditional centroid-based algorithms, establishing DIBmix as a competitive and theoretically grounded alternative for mixed-type data clustering.


[67] 2410.01194

Maximum Ideal Likelihood Estimation: A Unified Inference Framework for Latent Variable Models

This paper develops a unified estimation framework, the Maximum Ideal Likelihood Estimation (MILE), for general parametric models with latent variables. Unlike traditional approaches relying on the marginal likelihood of the observed data, MILE directly exploits the joint distribution of the complete data by treating the latent variables as parameters (the ideal likelihood). Borrowing strength from optimisation techniques and algorithms, MILE is a broadly applicable framework in case that traditional methods fail, such as when the marginal likelihood has non-finite expectations. MILE offers a flexible and robust alternative to established techniques, including the Expectation-Maximisation algorithm and Markov chain Monte Carlo. We facilitate statistical inference of MILE on consistency, asymptotic distribution, and equivalence to the Maximum Likelihood Estimation, under some mild conditions. Extensive simulations illustrative real-data applications illustrate the empirical advantages of MILE, outperforming existing methods on computational feasibility and scalability.


[68] 2411.04729

Conjugate gradient methods for high-dimensional GLMMs

Generalized linear mixed models (GLMMs) are a widely used tool in statistical analysis. The main bottleneck of many computational approaches lies in the inversion of the high dimensional precision matrices associated with the random effects. Such matrices are typically sparse; however, the sparsity pattern resembles a multi partite random graph, which does not lend itself well to default sparse linear algebra techniques. Notably, we show that, for typical GLMMs, the Cholesky factor is dense even when the original precision is sparse. We thus turn to approximate iterative techniques, in particular to the conjugate gradient (CG) method. We combine a detailed analysis of the spectrum of said precision matrices with results from random graph theory to show that CG-based methods applied to high-dimensional GLMMs typically achieve a fixed approximation error with a total cost that scales linearly with the number of parameters and observations. Numerical illustrations with both real and simulated data confirm the theoretical findings, while at the same time illustrating situations, such as nested structures, where CG-based methods struggle.


[69] 2501.19038

Conformal Prediction in Hierarchical Classification with Constrained Representation Complexity

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second one relaxes this restriction. Using the notion of representation complexity, the latter yields smaller set sizes at the cost of a more general and combinatorial inference problem. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.


[70] 2503.21534

Inequality Constrained Minimum Density Power Divergence Estimation in Panel Count Data

The analysis of panel count data has garnered considerable attention in the literature, leading to the development of multiple statistical techniques. In inferential analysis, most works focus on leveraging estimating equation-based techniques or conventional maximum likelihood estimation. However, the robustness of these methods is largely questionable. In this paper, we present a robust density power divergence estimation method for panel count data arising from non-homogeneous Poisson processes correlated through a latent frailty variable. To cope with real-world incidents, it is often desirable to impose certain inequality constraints on the parameter space, leading to the constrained minimum density power divergence estimator. Being incorporated with inequality restrictions, coupled with the inherent complexity of our objective function, standard computational algorithms are inadequate for estimation purposes. To overcome this, we adopt sequential convex programming, which approximates the original problem through a series of subproblems. Further, we study the asymptotic properties of the resultant estimator, making a significant contribution to this work. The proposed method ensures high efficiency in the model estimation while providing reliable inference despite data contamination. Moreover, the density power divergence measure is governed by a tuning parameter $\gamma$, which controls the trade-off between robustness and efficiency. To effectively determine the optimal value of $\gamma$, this study employs a generalized score-matching technique, marking considerable progress in the data analysis. Simulation studies and real data examples are provided to illustrate the performance of the estimator and to substantiate the theory developed.


[71] 2503.23629

Bot Identification in Social Media

Escalating proliferation of inorganic accounts, commonly known as bots, within the digital ecosystem represents an ongoing and multifaceted challenge to online security, trustworthiness, and user experience. These bots, often employed for the dissemination of malicious propaganda and manipulation of public opinion, wield significant influence in social media spheres with far-reaching implications for electoral processes, political campaigns and international conflicts. Swift and accurate identification of inorganic accounts is of paramount importance in mitigating their detrimental effects. This research paper focuses on the identification of such accounts and explores various effective methods for their detection through machine learning techniques. In response to the pervasive presence of bots in the contemporary digital landscape, this study extracts temporal and semantic features from tweet behaviors and proposes a bot detection algorithm utilizing fundamental machine learning approaches, including Support Vector Machines (SVM) and k-means clustering. Furthermore, the research ranks the importance of these extracted features for each detection technique and also provides uncertainty quantification using a distribution free method, called the conformal prediction, thereby contributing to the development of effective strategies for combating the prevalence of inorganic accounts in social media platforms.


[72] 2504.14898

Expected Free Energy-based Planning as Variational Inference

We address the problem of planning under uncertainty, where an agent must choose actions that not only achieve desired outcomes but also reduce uncertainty. Traditional methods often treat exploration and exploitation as separate objectives, lacking a unified inferential foundation. Active inference, grounded in the Free Energy Principle, provides such a foundation by minimizing Expected Free Energy (EFE), a cost function that combines utility with epistemic drives, such as ambiguity resolution and novelty seeking. However, the computational burden of EFE minimization had remained a significant obstacle to its scalability. In this paper, we show that EFE-based planning arises naturally from minimizing a variational free energy functional on a generative model augmented with preference and epistemic priors. This result reinforces theoretical consistency with the Free Energy Principle by casting planning under uncertainty itself as a form of variational inference. Our formulation yields policies that jointly support goal achievement and information gain, while incorporating a complexity term that accounts for bounded computational resources. This unifying framework connects and extends existing methods, enabling scalable, resource-aware implementations of active inference agents.


[73] 2505.00292

Offline changepoint localization using a matrix of conformal p-values

Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable MCP algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a (single) changepoint under the mild assumption that the pre-change and post-change distributions are each exchangeable. We prove a novel conformal Neyman-Pearson lemma, motivating practical classifier-based choices for our conformal score function. Finally, we exemplify the MCP algorithm on a variety of synthetic and real-world datasets, including using black-box pre-trained classifiers to detect changes in sequences of images, text, and accelerometer data.


[74] 2505.01318

Modeling Large Nonstationary Spatial Data with the Full-Scale Basis Graphical Lasso

We propose a new approach for the modeling large datasets of nonstationary spatial processes that combines a latent low rank process and a sparse covariance model. The low rank component coefficients are endowed with a flexible graphical Gaussian Markov random field model. The utilization of a low rank and compactly-supported covariance structure combines the full-scale approximation and the basis graphical lasso; we term this new approach the full-scale basis graphical lasso (FSBGL). Estimation employs a graphical lasso-penalized likelihood, which is optimized using a difference-of-convex scheme. We illustrate the proposed approach on synthetic fields as well as with a challenging high-resolution simulation dataset of the thermosphere. In a comparison against state-of-the-art spatial models, the FSBGL performs better at capturing salient features of the thermospheric temperature fields, even with limited available training data.


[75] 2506.05308

Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator

Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy-gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central limit theorem for the TPG estimator and develop a consistent variance estimator that remains valid under nonstationarity with single-trajectory data. We validate our theory with two real-world case studies. The results show that a well-calibrated TPG estimator attains low bias and variance in practical nonstationary settings.


[76] 2506.07844

Conditional Local Independence Testing for Itô processes with Applications to Dynamic Causal Discovery

Inferring causal relationships from dynamical systems is the central interest of many scientific inquiries. Conditional local independence, which describes whether the evolution of one process is influenced by another process given additional processes, is important for causal learning in such systems. In this paper, we propose a hypothesis test for conditional local independence in Itô processes. Our test is grounded in the semimartingale decomposition of the Itô process, with which we introduce a stochastic integral process that is a martingale under the null hypothesis. We then apply a test for the martingale property, quantifying potential deviation from local independence. The test statistics is estimated using the optimal filtering equation. We show the consistency of the estimation, thereby establishing the level and power of our test. Numerical verification and a real-world application to causal discovery in brain resting-state fMRIs are conducted.


[77] 2507.10679

FARS: Factor Augmented Regression Scenarios in R

In the context of macroeconomic/financial time series, the FARS package provides a comprehensive framework in R for the construction of conditional densities of the variable of interest based on the factor-augmented quantile regressions (FA-QRs) methodology, with the factors extracted from multi-level dynamic factor models (ML-DFMs) with potential overlapping group-specific factors. Furthermore, the package also allows the construction of measures of risk as well as modeling and designing economic scenarios based on the conditional densities. In particular, the package enables users to: (i) extract global and group-specific factors using a flexible multi-level factor structure; (ii) compute asymptotically valid confidence regions for the estimated factors, accounting for uncertainty in the factor loadings; (iii) obtain estimates of the parameters of the FA-QRs together with their standard deviations; (iv) recover full predictive conditional densities from estimated quantiles; (v) obtain risk measures based on extreme quantiles of the conditional densities; and (vi) estimate the conditional density and the corresponding extreme quantiles when the factors are stressed.


[78] 2507.12457

Does $K$-fold CV based penalty perform variable selection or does it lead to $n^{1/2}$-consistency in Lasso?

Least absolute shrinkage and selection operator or Lasso, introduced by Tibshirani (1996), is one of the widely used regularization methods in regression. It is observed that the properties of Lasso vary wildly depending on the choice of the penalty parameter. The recent results of Lahiri (2021) suggest that, depending on the nature of the penalty parameter, Lasso can either be variable selection consistent or be $n^{1/2}-$consistent. However, practitioners generally implement Lasso by choosing the penalty parameter in a data-dependent way, the most popular being the $K$-fold cross-validation. In this paper, we explore the variable selection consistency and $n^{1/2}-$consistency of Lasso when the penalty is chosen based on $K$-fold cross-validation with $K$ being fixed. We consider the fixed-dimensional heteroscedastic linear regression model and show that Lasso with $K$-fold cross-validation based penalty is $n^{1/2}-$consistent, but not variable selection consistent. We also establish the $n^{1/2}-$consistency of the $K$-fold cross-validation based penalty as an intermediate result. Additionally, as a consequence of $n^{1/2}-$consistency, we establish the validity of Bootstrap to approximate the distribution of the Lasso estimator based on $K-$fold cross-validation. We validate the Bootstrap approximation in finite samples based on a moderate simulation study. Thus, our results essentially justify the use of $K$-fold cross-validation in practice to draw inferences based on $n^{1/2}-$scaled pivotal quantities in Lasso regression.


[79] 2508.13969

Towards multi-purpose locally differentially-private synthetic data release via spline wavelet plug-in estimation

We develop plug-in estimators for locally differentially private semi-parametric estimation via spline wavelets. The approach leads to optimal rates of convergence for a large class of estimation problems that are characterized by (differentiable) functionals $\Lambda(f)$ of the true data generating density $f$. The crucial feature of the locally private data $Z_1,\dots, Z_n$ we generate is that it does not depend on the particular functional $\Lambda$ (or the unknown density $f$) the analyst wants to estimate. Hence, the synthetic data can be generated and stored a priori and can subsequently be used by any number of analysts to estimate many vastly different functionals of interest at the provably optimal rate. In principle, this removes a long standing practical limitation in statistics of differential privacy, namely, that optimal privacy mechanisms need to be tailored towards the specific estimation problem at hand.


[80] 2509.22916

Structural Nested Mean Models for Modified Treatment Policies

There is a growing literature on estimating effects of treatment strategies based on the natural treatment that would have been received in the absence of intervention, often dubbed `modified treatment policies' (MTPs). MTPs are sometimes of interest because they are more realistic than interventions setting exposure to an ideal level for all members of a population. In the general time-varying setting, Richardson and Robins (2013) provided exchangeability conditions for nonparametric identification of MTP effects that could be deduced from Single World Intervention Graphs (SWIGs). Diaz (2023) provided multiply robust estimators under these identification assumptions that allow for machine learning nuisance regressions. In this paper, we fill a remaining gap by extending Structural Nested Mean Models (SNMMs) to MTP settings, which enables characterization of (time-varying) heterogeneity of MTP effects. We do this both under the exchangeability assumptions of Richardson and Robins (2013) and under parallel trends assumptions, which enables investigation of (time-varying heterogeneous) MTP effects in the presence of some unobserved confounding.


[81] 2510.03616

Identification in source apportionment using geometry

Source apportionment analysis, which aims to quantify the attribution of observed concentrations of multiple air pollutants to specific sources, can be formulated as a non-negative matrix factorization (NMF) problem. However, NMF is non-unique and typically relies on unverifiable assumptions such as sparsity and uninterpretable scalings. In this manuscript, we establish identifiability of the source attribution percentage matrix under much weaker and more realistic conditions. We introduce the population-level estimand for this matrix, and show that it is scale-invariant and identifiable even when the NMF factors are not. Viewing the data as a point cloud in a conical hull, we show that a geometric estimator of the source attribution percentage matrix is consistent without any sparsity or parametric distributional assumptions, and while accommodating spatio-temporal dependence. Numerical experiments corroborate the theory.


[82] 2510.03949

Analysis of kinetic Langevin Monte Carlo under the stochastic exponential Euler discretization from underdamped all the way to overdamped

Simulating the kinetic Langevin dynamics is a popular approach for sampling from distributions, where only their unnormalized densities are available. Various discretizations of the kinetic Langevin dynamics have been considered, where the resulting algorithm is collectively referred to as the kinetic Langevin Monte Carlo (KLMC) or underdamped Langevin Monte Carlo. Specifically, the stochastic exponential Euler discretization, or exponential integrator for short, has previously been studied under strongly log-concave and log-Lipschitz smooth potentials via the synchronous Wasserstein coupling strategy. Existing analyses, however, impose restrictions on the parameters that do not explain the behavior of KLMC under various choices of parameters. In particular, all known results fail to hold in the overdamped regime, suggesting that the exponential integrator degenerates in the overdamped limit. In this work, we revisit the synchronous Wasserstein coupling analysis of KLMC with the exponential integrator. Our refined analysis results in Wasserstein contractions and bounds on the asymptotic bias that hold under weaker restrictions on the parameters, which assert that the exponential integrator is capable of stably simulating the kinetic Langevin dynamics in the overdamped regime, as long as proper time acceleration is applied.


[83] 2510.04582

Constrained Dikin-Langevin diffusion for polyhedra

Interior-point geometry offers a straightforward approach to constrained sampling and optimization on polyhedra, eliminating reflections and ad hoc projections. We exploit the Dikin log-barrier to define a Dikin--Langevin diffusion whose drift and noise are modulated by the inverse barrier Hessian. In continuous time, we establish a boundary no-flux property; trajectories started in the interior remain in $U$ almost surely, so feasibility is maintained by construction. For computation, we adopt a discretize-then-correct design: an Euler--Maruyama proposal with state-dependent covariance, followed by a Metropolis--Hastings correction that targets the exact constrained law and reduces to a Dikin random walk when $f$ is constant. Numerically, the unadjusted diffusion exhibits the expected first-order step size bias, while the MH-adjusted variant delivers strong convergence diagnostics on anisotropic, box-constrained Gaussians (rank-normalized split-$\hat{R}$ concentrated near $1$) and higher inter-well transition counts on a bimodal target, indicating superior cross-well mobility. Taken together, these results demonstrate that coupling calibrated stochasticity with interior-point preconditioning provides a practical, reflection-free approach to sampling and optimization over polyhedral domains, offering clear advantages near faces, corners, and in nonconvex landscapes.


[84] 2210.17063

Shrinkage Methods for Treatment Choice

This study examines the problem of determining whether to treat individuals based on observed covariates. The most common decision rule is the conditional empirical success (CES) rule proposed by Manski (2004), which assigns individuals to treatments that yield the best experimental outcomes conditional on the observed covariates. Conversely, using shrinkage estimators, which shrink unbiased but noisy preliminary estimates toward the average of these estimates, is a common approach in statistical estimation problems because it is well-known that shrinkage estimators may have smaller mean squared errors than unshrunk estimators. Inspired by this idea, we propose a computationally tractable shrinkage rule that selects the shrinkage factor by minimizing an upper bound of the maximum regret. Then, we compare the maximum regret of the proposed shrinkage rule with those of the CES and pooling rules when the space of conditional average treatment effects (CATEs) is correctly specified or misspecified. Our theoretical results demonstrate that the shrinkage rule performs well in many cases and these findings are further supported by numerical experiments. Specifically, we show that the maximum regret of the shrinkage rule can be strictly smaller than those of the CES and pooling rules in certain cases when the space of CATEs is correctly specified. In addition, we find that the shrinkage rule is robust against misspecification of the space of CATEs. Finally, we apply our method to experimental data from the National Job Training Partnership Act Study.


[85] 2303.16822

An inexact LPA for DC composite optimization and application to matrix completions with outliers

This paper concerns a class of DC composite optimization problems which, as an extension of convex composite optimization problems and DC programs with nonsmooth components, often arises in robust factorization models of low-rank matrix recovery. For this class of nonconvex and nonsmooth problems, we propose an inexact linearized proximal algorithm (iLPA) by computing at each step an inexact minimizer of a strongly convex majorization constructed with a partial linearization of their objective functions at the current iterate. We establish the full convergence of the generated iterate sequence under the Kurdyka-Łöjasiewicz (KL) property of a potential function, and employ the composite structure to provide a verifiable condition for the potential function to satisfy the KL property of exponent $1/2$ at the limit point, so for the iterate sequence to have a local R-linear convergence rate. This condition is weaker than the one provided in \cite[Theorem 3.2]{LiPong18} for identifying the KL property of exponent $p\in[0,1)$ for a general composite function. The proposed iLPA is applied to a robust factorization model for matrix completion with outliers and non-uniform sampling, and numerical comparisons with the Polyak subgradient method and a proximal alternating minimization (PAM) method validate its efficiency.


[86] 2306.17470

Oblivious Stochastic Composite Optimization

In stochastic convex optimization problems, most existing adaptive methods rely on prior knowledge about the diameter bound $D$ when the smoothness or the Lipschitz constant is unknown. This often significantly affects performance as only a rough approximation of $D$ is usually known in practice. Here, we bypass this limitation by combining mirror descent with dual averaging techniques and we show that, under oblivious step-sizes regime, our algorithms converge without any prior knowledge on the parameters of the problem. We introduce three oblivious stochastic algorithms to address different settings. The first algorithm is designed for objectives in relative scale, the second one is an accelerated version tailored for smooth objectives, whereas the last one is for relatively-smooth objectives. All three algorithms work without prior knowledge of the diameter of the feasible set, the Lipschitz constant or smoothness of the objective function. We use these results to revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large-scale semidefinite programs.


[87] 2406.04588

Convergence of the majorized PAM method with subspace correction for low-rank composite factorization model

This paper focuses on the convergence certificates of the majorized proximal alternating minimization (PAM) method with subspace correction, proposed in \cite{TaoQianPan22} for the column $\ell_{2,0}$-norm regularized factorization model and now extended to a class of low-rank composite factorization models from matrix completion. The convergence analysis of this PAM method becomes extremely challenging because a subspace correction step is introduced to every proximal subproblem to ensure a closed-form solution. We establish the full convergence of the iterate sequence and column subspace sequences of factor pairs generated by the PAM, under the KL property of the objective function and a condition that holds automatically for the column $\ell_{2,0}$-norm function. Numerical comparison with the popular proximal alternating linearized minimization (PALM) method is conducted on one-bit matrix completion problems, which indicates that the PAM with subspace correction has an advantage in seeking lower relative error within less time.


[88] 2406.05428

Information-Theoretic Thresholds for the Alignments of Partially Correlated Graphs

This paper studies the problem of recovering the hidden vertex correspondence between two correlated random graphs. We propose the partially correlated Erdős-Rényi graphs model, wherein a pair of induced subgraphs with a certain number are correlated. We investigate the information-theoretic thresholds for recovering the latent correlated subgraphs and the hidden vertex correspondence. We prove that there exists an optimal rate for partial recovery for the number of correlated nodes, above which one can correctly match a fraction of vertices and below which correctly matching any positive fraction is impossible, and we also derive an optimal rate for exact recovery. In the proof of possibility results, we propose correlated functional digraphs, which partition the edges of the intersection graph into two types of components, and bound the error probability by lower-order cumulant generating functions. The proof of impossibility results build upon the generalized Fano's inequality and the recovery thresholds settled in correlated Erdős-Rényi graphs model.


[89] 2407.11676

SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities

Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many methods have been proposed in the literature, fair and realistic evaluation remains an open question, particularly due to methodological difficulties in selecting hyperparameters in the unsupervised setting. With SKADA-bench, we propose a framework to evaluate DA methods on diverse modalities, beyond computer vision task that have been largely explored in the literature. We present a complete and fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment. Realistic hyperparameter selection is performed with nested cross-validation and various unsupervised model selection scores, on both simulated datasets with controlled shifts and real-world datasets across diverse modalities, such as images, text, biomedical, and tabular data. Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications, with key insights into the choice and impact of model selection approaches. SKADA-bench is open-source, reproducible, and can be easily extended with novel DA methods, datasets, and model selection criteria without requiring re-evaluating competitors. SKADA-bench is available on Github at this https URL.


[90] 2408.06958

AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm

We present AuToMATo, a novel clustering algorithm based on persistent homology. While AuToMATo is not parameter-free per se, we provide default choices for its parameters that make it into an out-of-the-box clustering algorithm that performs well across the board. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that AuToMATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the Mapper algorithm, where it is desirable to work with a clustering algorithm that does not need tuning of its parameters. Indeed, we provide evidence that AuToMATo performs well when used with Mapper. Finally, we provide an open-source implementation of AuToMATo in Python that is fully compatible with the standard scikit-learn architecture.


[91] 2410.00709

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches

Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. Binding affinity, which characterizes the strength of biomolecular interactions, is essential for tackling diverse challenges in life sciences, including therapeutic design, protein engineering, enzyme optimization, and elucidating biological mechanisms. Much work has been devoted to predicting binding affinity over the past decades. Here, we review recent significant works, with a focus on methods, evaluation strategies, and benchmark datasets. We note growing use of both traditional machine learning and deep learning models for predicting binding affinity, accompanied by an increasing amount of data on proteins and small drug-like molecules. With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction; reciprocally, progress in building binding affinity predictors can refine AIVCs. Future efforts in binding affinity prediction and AI-driven in silico models can enhance the simulation of temporal dynamics, cell-type specificity, and multi-omics integration to support more accurate and personalized outcomes.


[92] 2410.02086

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

A unified representation space in multi-modal learning is essential for effectively integrating diverse data sources, such as text, images, and audio, to enhance efficiency and performance across various downstream tasks. Recent binding methods, such as ImageBind, typically rely on a single, fixed anchor modality for aligning multi-modal data. We mathematically analyze these fixed anchor binding methods and uncover significant limitations: (1) over-reliance on the choice of the anchor modality, (2) inadequate capture of intra-modal information, and (3) failure to account for cross-modal correlation among non-anchored modalities. To address these issues, we propose the need for adaptive anchor binding methods, exemplified by our framework CentroBind. The proposed method uses adaptively adjustable centroid-based anchors generated from all available modalities, leading to a balanced and rich representation space. We theoretically demonstrate that our approach captures three critical properties of multi-modal learning -- intra-modal learning, inter-modal learning, and multi-modal alignment -- while constructing a unified representation that spans all modalities. Experiments on both synthetic and real-world datasets show that adaptive anchor methods such as CentroBind consistently outperform fixed anchor binding methods, verifying our analysis.


[93] 2412.06438

Can foundation models actively gather information in interactive environments to test hypotheses?

Foundation models excel at single-turn reasoning but struggle with multi-turn exploration in dynamic environments, a requirement for many real-world challenges. We evaluated these models on their ability to learn from experience, adapt, and gather information. First, in "Feature World," a simple setting for testing information gathering, models performed near-optimally. However, to test more complex, multi-trial learning, we implemented a text-based version of the "Alchemy" environment, a benchmark for meta-learning. Here, agents must deduce a latent causal structure by integrating information across many trials. In this setting, recent foundation models initially failed to improve their performance over time. Crucially, we found that prompting the models to summarize their observations at regular intervals enabled an emergent meta-learning process. This allowed them to improve across trials and even adaptively re-learn when the environment's rules changed unexpectedly. While most models handled the simple task, Alchemy revealed stark differences in robustness: Gemini 2.5 performed best, followed by Claude 3.7, while ChatGPT-4o and o4-mini struggled. This underscores Alchemy's value as a benchmark. Our findings demonstrate that the biggest challenge for foundation models is not selecting informative actions in the moment, but integrating knowledge through adaptive strategies over time. Encouragingly, there appears to be no intrinsic barrier to future models mastering these abilities.


[94] 2506.10159

Probabilistic Variational Contrastive Learning

Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_\theta(z|x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiation--VSimCLR and VSupCon--replace deterministic embeddings with samples from $q_\theta(z|x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches.


[95] 2508.00542

Assessing (im)balance in signed brain networks

Many complex systems - be they financial, natural or social - are composed by units - such as stocks, neurons or agents - whose joint activity can be represented as a multivariate time series. An issue of both practical and theoretical importance concerns the possibility of inferring the presence of a static relationships between any two units solely from their dynamic state. The present contribution aims at tackling such an issue within the frame of traditional hypothesis testing: briefly speaking, our suggestion is that of linking any two units if behaving in a sufficiently similar way. To achieve such a goal, we project a multivariate time series onto a signed graph by i) comparing the empirical properties of the former with those expected under a suitable benchmark and ii) linking any two units with a positive (negative) edge in case the corresponding series share a significantly large number of concordant (discordant) values. To define our benchmarks, we adopt an information-theoretic approach that is rooted into the constrained maximisation of Shannon entropy, a procedure inducing an ensemble of multivariate time series that preserves some of the empirical properties on average while randomising everything else. We showcase the possible applications of our method by addressing one of the most timely issues in the domain of neurosciences, i.e. that of determining if brain networks are frustrated or not - and, in case, to what extent. As our results suggest, this is indeed the case, the structure of the negative subgraph being more prone to inter-subject variability than the complementary, positive subgraph. At the mesoscopic level, instead, the minimisation of the Bayesian Information Criterion instantiated with the Signed Stochastic Block Model reveals that brain areas gather into modules aligning with the statistical variant of the Relaxed Balance Theory.


[96] 2508.13703

Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine Scheduling

Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.


[97] 2509.11007

Gradient Methods with Online Scaling Part II. Practical Aspects

Part I of this work [Gao25] establishes online scaled gradient methods (OSGM), a framework that utilizes online convex optimization to adapt stepsizes in gradient methods. This paper focuses on the practical aspects of OSGM. We leverage the OSGM framework to design new adaptive first-order methods and provide insights into their empirical behavior. The resulting method, OSGM-Best, matches the performance of quasi-Newton variants while requiring less memory and cheaper iterations. We also extend OSGM to nonconvex optimization and outline directions that connect OSGM to existing branches of optimization theory and practice.


[98] 2509.14186

Quickest Change Detection with Cost-Constrained Experiment Design

In the classical quickest change detection problem, an observer performs a single experiment to monitor a stochastic process. The goal in the classical problem is to detect a change in the statistical properties of the process, with the minimum possible delay, subject to a constraint on the rate of false alarms. This paper considers the case where, at each observation time, the decision-maker must choose between multiple experiments with varying information qualities and costs. The change can be detected using any of the experiments. The goal here is to detect the change with the minimum delay, subject to constraints on the rate of false alarms and the fraction of time each experiment is performed before the time of change. The constraint on the fraction of time can be used to control the overall cost of using the system of experiments. An algorithm called the two-experiment cumulative sum (2E-CUSUM) algorithm is first proposed to solve the problem when there are only two experiments. The algorithm for the case of multiple experiments, starting with three experiments, is then designed iteratively using the 2E-CUSUM algorithm. Two key ideas used in the design are the scaling of undershoots and the truncation of tests. The multiple-experiment algorithm can be designed to satisfy the constraints and can achieve the delay performance of the experiment with the highest quality within a constant. The important concept of data efficiency, where the observer has the choice of not performing any experiment, is explored as well.


[99] 2510.00048

Deep Learning Approaches with Explainable AI for Differentiating Alzheimer Disease and Mild Cognitive Impairment

Early and accurate diagnosis of Alzheimer Disease is critical for effective clinical intervention, particularly in distinguishing it from Mild Cognitive Impairment, a prodromal stage marked by subtle structural changes. In this study, we propose a hybrid deep learning ensemble framework for Alzheimer Disease classification using structural magnetic resonance imaging. Gray and white matter slices are used as inputs to three pretrained convolutional neural networks such as ResNet50, NASNet, and MobileNet, each fine tuned through an end to end process. To further enhance performance, we incorporate a stacked ensemble learning strategy with a meta learner and weighted averaging to optimally combine the base models. Evaluated on the Alzheimer Disease Neuroimaging Initiative dataset, the proposed method achieves state of the art accuracy of 99.21% for Alzheimer Disease vs. Mild Cognitive Impairment and 91.0% for Mild Cognitive Impairment vs. Normal Controls, outperforming conventional transfer learning and baseline ensemble methods. To improve interpretability in image based diagnostics, we integrate Explainable AI techniques by Gradient weighted Class Activation, which generates heatmaps and attribution maps that highlight critical regions in gray and white matter slices, revealing structural biomarkers that influence model decisions. These results highlight the frameworks potential for robust and scalable clinical decision support in neurodegenerative disease diagnostics.