New articles on Statistics


[1] 2407.17534

Extension of W-method and A-learner for multiple binary outcomes

In this study, we compared two groups, in which subjects were assigned to either the treatment or the control group. In such trials, if the efficacy of the treatment cannot be demonstrated in a population that meets the eligibility criteria, identifying the subgroups for which the treatment is effective is desirable. Such subgroups can be identified by estimating heterogeneous treatment effects (HTE). In recent years, methods for estimating HTE have increasingly relied on complex models. Although these models improve the estimation accuracy, they often sacrifice interpretability. Despite significant advancements in the methods for continuous or univariate binary outcomes, methods for multiple binary outcomes are less prevalent, and existing interpretable methods, such as the W-method and A-learner, while capable of estimating HTE for a single binary outcome, still fail to capture the correlation structure when applied to multiple binary outcomes. We thus propose two methods for estimating HTE for multiple binary outcomes: one based on the W-method and the other based on the A-learner. We also demonstrate that the conventional A-learner introduces bias in the estimation of the treatment effect. The proposed method employs a framework based on reduced-rank regression to capture the correlation structure among multiple binary outcomes. We correct for the bias inherent in the A-learner estimates and investigate the impact of this bias through numerical simulations. Finally, we demonstrate the effectiveness of the proposed method using a real data application.


[2] 2407.17592

Robust Maximum $L_q$-Likelihood Covariance Estimation for Replicated Spatial Data

Parameter estimation with the maximum $L_q$-likelihood estimator (ML$q$E) is an alternative to the maximum likelihood estimator (MLE) that considers the $q$-th power of the likelihood values for some $q<1$. In this method, extreme values are down-weighted because of their lower likelihood values, which yields robust estimates. In this work, we study the properties of the ML$q$E for spatial data with replicates. We investigate the asymptotic properties of the ML$q$E for Gaussian random fields with a Mat\'ern covariance function, and carry out simulation studies to investigate the numerical performance of the ML$q$E. We show that it can provide more robust and stable estimation results when some of the replicates in the spatial data contain outliers. In addition, we develop a mechanism to find the optimal choice of the hyper-parameter $q$ for the ML$q$E. The robustness of our approach is further verified on a United States precipitation dataset. Compared with other robust methods for spatial data, our proposal is more intuitive and easier to understand, yet it performs well when dealing with datasets containing outliers.


[3] 2407.17658

Semiparametric Piecewise Accelerated Failure Time Model for the Analysis of Immune-Oncology Clinical Trials

Effectiveness of immune-oncology chemotherapies has been presented in recent clinical trials. The Kaplan-Meier estimates of the survival functions of the immune therapy and the control often suggested the presence of the lag-time until the immune therapy began to act. It implies the use of hazard ratio under the proportional hazards assumption would not be appealing, and many alternatives have been investigated such as the restricted mean survival time. In addition to such overall summary of the treatment contrast, the lag-time is also an important feature of the treatment effect. Identical survival functions up to the lag-time implies patients who are likely to die before the lag-time would not benefit the treatment and identifying such patients would be very important. We propose the semiparametric piecewise accelerated failure time model and its inference procedure based on the semiparametric maximum likelihood method. It provides not only an overall treatment summary, but also a framework to identify patients who have less benefit from the immune-therapy in a unified way. Numerical experiments confirm that each parameter can be estimated with minimal bias. Through a real data analysis, we illustrate the evaluation of the effect of immune-oncology therapy and the characterization of covariates in which patients are unlikely to receive the benefit of treatment.


[4] 2407.17666

Causal estimands and identification of time-varying effects in non-stationary time series from N-of-1 mobile device data

Mobile technology (mobile phones and wearable devices) generates continuous data streams encompassing outcomes, exposures and covariates, presented as intensive longitudinal or multivariate time series data. The high frequency of measurements enables granular and dynamic evaluation of treatment effect, revealing their persistence and accumulation over time. Existing methods predominantly focus on the contemporaneous effect, temporal-average, or population-average effects, assuming stationarity or invariance of treatment effects over time, which are inadequate both conceptually and statistically to capture dynamic treatment effects in personalized mobile health data. We here propose new causal estimands for multivariate time series in N-of-1 studies. These estimands summarize how time-varying exposures impact outcomes in both short- and long-term. We propose identifiability assumptions and a g-formula estimator that accounts for exposure-outcome and outcome-covariate feedback. The g-formula employs a state space model framework innovatively to accommodate time-varying behavior of treatment effects in non-stationary time series. We apply the proposed method to a multi-year smartphone observational study of bipolar patients and estimate the dynamic effect of phone-based communication on mood of patients with bipolar disorder in an N-of-1 setting. Our approach reveals substantial heterogeneity in treatment effects over time and across individuals. A simulation-based strategy is also proposed for the development of a short-term, dynamic, and personalized treatment recommendation based on patient's past information, in combination with a novel positivity diagnostics plot, validating proper causal inference in time series data.


[5] 2407.17682

Constructing Markov chains with given dependence and marginal stationary distributions

A method of constructing Markov chains on finite state spaces is provided. The chain is specified by three constraints: stationarity, dependence and marginal distributions. The generalized Pythagorean theorem in information geometry plays a central role in the construction. An algorithm for obtaining the desired Markov chain is described. Integer-valued autoregressive processes are considered for illustration.


[6] 2407.17694

Doubly Robust Conditional Independence Testing with Generative Neural Networks

This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$, which plays an important role in statistical and machine learning applications. We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions but instead requires sampling from the two marginal conditional distributions of $X$ given $Z$ and $Y$ given $Z$. We further propose using a generative neural network (GNN) framework to sample from these approximated marginal conditional distributions, which tends to mitigate the curse of dimensionality due to its adaptivity to any low-dimensional structures and smoothness underlying the data. Theoretically, our test statistic is shown to enjoy a doubly robust property against GNN approximation errors, meaning that the test statistic retains all desirable properties of the oracle test statistic utilizing the true marginal conditional distributions, as long as the product of the two approximation errors decays to zero faster than the parametric rate. Asymptotic properties of our statistic and the consistency of a bootstrap procedure are derived under both null and local alternatives. Extensive numerical experiments and real data analysis illustrate the effectiveness and broad applicability of our proposed test.


[7] 2407.17718

Comparison of global sensitivity analysis methods for a fire spread model with a segmented characteristic

Global sensitivity analysis (GSA) can provide rich information for controlling output uncertainty. In practical applications, segmented models are commonly used to describe an abrupt model change. For segmented models, the complicated uncertainty propagation during the transition region may lead to different importance rankings of different GSA methods. If an unsuitable GSA method is applied, misleading results will be obtained, resulting in suboptimal or even wrong decisions. In this paper, four GSA indices, i.e., Sobol index, mutual information, delta index and PAWN index, are applied for a segmented fire spread model (Dry Eucalypt). The results show that four GSA indices give different importance rankings during the transition region since segmented characteristics affect different GSA indices in different ways. We suggest that analysts should rely on the results of different GSA indices according to their practical purpose, especially when making decisions for segmented models during the transition region.


[8] 2407.17719

A new moment-independent uncertainty importance measure based on cumulative residual entropy for developing uncertainty reduction strategies

Uncertainty reduction is vital for improving system reliability and reducing risks. To identify the best target for uncertainty reduction, uncertainty importance measure is commonly used to prioritize the significance of input variable uncertainties. Then, designers will take steps to reduce the uncertainties of variables with high importance. However, for variables with minimal uncertainty, the cost of controlling their uncertainties can be unacceptable. Therefore, uncertainty magnitude should also be considered in developing uncertainty reduction strategies. Although variance-based methods have been developed for this purpose, they are dependent on statistical moments and have limitations when dealing with highly-skewed distributions that are commonly encountered in practical applications. Motivated by this problem, we propose a new uncertainty importance measure based on cumulative residual entropy. The proposed measure is moment-independent based on the cumulative distribution function, which can handle the highly-skewed distributions properly. Numerical implementations for estimating the proposed measure are devised and verified. A real-world engineering case considering highly-skewed distributions is introduced to show the procedure of developing uncertainty reduction strategies considering uncertainty magnitude and corresponding cost. The results demonstrate that the proposed measure can present a different uncertainty reduction recommendation compared to the variance-based approach because of its moment-independent characteristic.


[9] 2407.17720

Multi-physics Simulation Guided Generative Diffusion Models with Applications in Fluid and Heat Dynamics

In this paper, we present a generic physics-informed generative model called MPDM that integrates multi-fidelity physics simulations with diffusion models. MPDM categorizes multi-fidelity physics simulations into inexpensive and expensive simulations, depending on computational costs. The inexpensive simulations, which can be obtained with low latency, directly inject contextual information into DDMs. Furthermore, when results from expensive simulations are available, MPDM refines the quality of generated samples via a guided diffusion process. This design separates the training of a denoising diffusion model from physics-informed conditional probability models, thus lending flexibility to practitioners. MPDM builds on Bayesian probabilistic models and is equipped with a theoretical guarantee that provides upper bounds on the Wasserstein distance between the sample and underlying true distribution. The probabilistic nature of MPDM also provides a convenient approach for uncertainty quantification in prediction. Our models excel in cases where physics simulations are imperfect and sometimes inaccessible. We use a numerical simulation in fluid dynamics and a case study in heat dynamics within laser-based metal powder deposition additive manufacturing to demonstrate how MPDM seamlessly integrates multi-idelity physics simulations and observations to obtain surrogates with superior predictive performance.


[10] 2407.17804

Bayesian Spatiotemporal Wombling

Stochastic process models for spatiotemporal data underlying random fields find substantial utility in a range of scientific disciplines. Subsequent to predictive inference on the values of the random field (or spatial surface indexed continuously over time) at arbitrary space-time coordinates, scientific interest often turns to gleaning information regarding zones of rapid spatial-temporal change. We develop Bayesian modeling and inference for directional rates of change along a given surface. These surfaces, which demarcate regions of rapid change, are referred to as ``wombling'' surface boundaries. Existing methods for studying such changes have often been associated with curves and are not easily extendable to surfaces resulting from curves evolving over time. Our current contribution devises a fully model-based inferential framework for analyzing differential behavior in spatiotemporal responses by formalizing the notion of a ``wombling'' surface boundary using conventional multi-linear vector analytic frameworks and geometry followed by posterior predictive computations using triangulated surface approximations. We illustrate our methodology with comprehensive simulation experiments followed by multiple applications in environmental and climate science; pollutant analysis in environmental health; and brain imaging.


[11] 2407.17832

Regularized Adjusted Plus-Minus Models for Evaluating and Scouting Football (Soccer) Players using Possession Sequences

This paper presents a novel framework for evaluating players in association football (soccer). Our method uses possession sequences, i.e. sequences of consecutive on-ball actions, for deriving estimates for player strengths. On the surface, the methodology is similar to classical adjusted plus-minus rating models using mainly regularized regression techniques. However, by analyzing possessions, our framework is able to distinguish on-ball and off-ball contributions of players to the game. From a methodological viewpoint, the framework explores four different penalization schemes, which exploit football-specific structures such as the grouping of players into position groups as well as into common strength groups. These four models lead to four ways to rate players by considering the respective estimate of each model corresponding to the player. The ratings are used to analyze the 2017/18 season of the Spanish La Liga. We compare similarities as well as particular use cases of each of the penalized models and provide guidance for practitioners when using the individual model specifications. Finally, we conclude our analysis by providing a domain-specific statistical evaluation framework, which highlights the potential of the penalized regression approaches for evaluating players.


[12] 2407.17848

Bayesian Benchmarking Small Area Estimation via Entropic Tilting

Benchmarking estimation and its risk evaluation is a practically important issue in small area estimation. While hierarchical Bayesian methods have been widely adopted in small area estimation, a unified Bayesian approach to benchmarking estimation has not been fully discussed. This work employs an entropic tilting method to modify the posterior distribution of the small area parameters to meet the benchmarking constraint, which enables us to obtain benchmarked point estimation as well as reasonable uncertainty quantification. Using conditionally independent structures of the posterior, we first introduce general Monte Carlo methods for obtaining a benchmarked posterior and then show that the benchmarked posterior can be obtained in an analytical form for some representative small area models. We demonstrate the usefulness of the proposed method through simulation and empirical studies.


[13] 2407.17851

Bad local minima exist in the stochastic block model

We study the disassortative stochastic block model with three communities, a well-studied model of graph partitioning and Bayesian inference for which detailed predictions based on the cavity method exist [Decelle et al. (2011)]. We provide strong evidence that for a part of the phase where efficient algorithms exist that approximately reconstruct the communities, inference based on maximum a posteriori (MAP) fails. In other words, we show that there exist modes of the posterior distribution that have a vanishing agreement with the ground truth. The proof is based on the analysis of a graph colouring algorithm from [Achlioptas and Moore (2003)].


[14] 2407.17910

Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These traditional assumptions frequently prove inadequate in real-world settings, thereby restricting the capability of current OPE methods to effectively address complex interference effects. In response, we advocate for the implementation of the permutation invariance (PI) assumption. This innovative approach enables the data-driven, adaptive learning of the mean-field function, offering a more flexible estimation method beyond conventional averaging. Furthermore, we present novel algorithms that incorporate the PI assumption into OPE and thoroughly examine their theoretical foundations. Our numerical analyses demonstrate that this novel approach yields significantly more precise estimations than existing baseline algorithms, thereby substantially improving the practical applicability and effectiveness of OPE methodologies. A Python implementation of our proposed method is available at https://github.com/BIG-S2/Causal-Deepsets.


[15] 2407.17920

Tobit Exponential Smoothing, towards an enhanced demand planning in the presence of censored data

ExponenTial Smoothing (ETS) is a widely adopted forecasting technique in both research and practical applications. One critical development in ETS was the establishment of a robust statistical foundation based on state space models with a single source of error. However, an important challenge in ETS that remains unsolved is censored data estimation. This issue is critical in supply chain management, in particular, when companies have to deal with stockouts. This work solves that problem by proposing the Tobit ETS, which extends the use of ETS models to handle censored data efficiently. This advancement builds upon the linear models taxonomy and extends it to encompass censored data scenarios. The results show that the Tobit ETS reduces considerably the forecast bias. Real and simulation data are used from the airline and supply chain industries to corroborate the findings.


[16] 2407.17949

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

By utilizing recently developed tools for constructing gradient flows on Wasserstein spaces, we extend an analysis technique commonly employed to understand alternating minimization algorithms on Euclidean space to the Expectation Maximization (EM) algorithm via its representation as coordinate-wise minimization on the product of a Euclidean space and a space of probability distributions due to Neal and Hinton (1998). In so doing we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of a log-Sobolev inequality. We further demonstrate that the analysis technique is sufficiently flexible to allow also the analysis of several variants of the EM algorithm.


[17] 2407.17986

Preventive Replacement Policies of Parallel/Series Systems with Dependent Components under Deviation Costs

This manuscript studies the preventive replacement policy for a series or parallel system consisting of n independent or dependent heterogeneous components. Firstly, for the age replacement policy, Some sufficient conditions for the existence and uniqueness of the optimal replacement time for both the series and parallel systems are provided. By introducing deviation costs, the expected cost rate of the system is optimized, and the optimal replacement time of the system is extended. Secondly, the periodic replacement policy for series and parallel systems is considered in the dependent case, and a sufficient condition for the existence and uniqueness of the optimal number of periods is provided. Some numerical examples are given to illustrate and discuss the above preventive replacement policies.


[18] 2407.18077

An Alternating Direction Method of Multipliers Algorithm for the Weighted Fused LASSO Signal Approximator

We present an Alternating Direction Method of Multipliers (ADMM) algorithm designed to solve the Weighted Generalized Fused LASSO Signal Approximator (wFLSA). First, we show that wFLSAs can always be reformulated as a Generalized LASSO problem. With the availability of algorithms tailored to the Generalized LASSO, the issue appears to be, in principle, resolved. However, the computational complexity of these algorithms is high, with a time complexity of $O(p^4)$ for a single iteration, where $p$ represents the number of coefficients. To overcome this limitation, we propose an ADMM algorithm specifically tailored for wFLSA-equivalent problems, significantly reducing the complexity to $O(p^2)$. Our algorithm is publicly accessible through the R package wflsa.


[19] 2407.18158

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality text. Additionally, the tightness of these existing bounds depends on the number of IID documents in a training set rather than the much larger number of non-IID constituent tokens, leaving untapped potential for tighter bounds. In this work, we instead use properties of martingales to derive generalization bounds that benefit from the vast number of tokens in LLM training sets. Since a dataset contains far more tokens than documents, our generalization bounds not only tolerate but actually benefit from far less restrictive compression schemes. With Monarch matrices, Kronecker factorizations, and post-training quantization, we achieve non-vacuous generalization bounds for LLMs as large as LLaMA2-70B. Unlike previous approaches, our work achieves the first non-vacuous bounds for models that are deployed in practice and generate high-quality text.


[20] 2407.18163

Statistical optimal transport

We present an introduction to the field of statistical optimal transport, based on lectures given at \'Ecole d'\'Et\'e de Probabilit\'es de Saint-Flour XLIX.


[21] 2407.18166

Identification and multiply robust estimation of causal effects via instrumental variables from an auxiliary heterogeneous population

Evaluating causal effects in a primary population of interest with unmeasured confounders is challenging. Although instrumental variables (IVs) are widely used to address unmeasured confounding, they may not always be available in the primary population. Fortunately, IVs might have been used in previous observational studies on similar causal problems, and these auxiliary studies can be useful to infer causal effects in the primary population, even if they represent different populations. However, existing methods often assume homogeneity or equality of conditional average treatment effects between the primary and auxiliary populations, which may be limited in practice. This paper aims to remove the homogeneity requirement and establish a novel identifiability result allowing for different conditional average treatment effects across populations. We also construct a multiply robust estimator that remains consistent despite partial misspecifications of the observed data model and achieves local efficiency if all nuisance models are correct. The proposed approach is illustrated through simulation studies. We finally apply our approach by leveraging data from lower income individuals with cigarette price as a valid IV to evaluate the causal effect of smoking on physical functional status in higher income group where strong IVs are not available.


[22] 2407.14335

Quantifying the Blockchain Trilemma: A Comparative Analysis of Algorand, Ethereum 2.0, and Beyond

Blockchain technology is essential for the digital economy and metaverse, supporting applications from decentralized finance to virtual assets. However, its potential is constrained by the "Blockchain Trilemma," which necessitates balancing decentralization, security, and scalability. This study evaluates and compares two leading proof-of-stake (PoS) systems, Algorand and Ethereum 2.0, against these critical metrics. Our research interprets existing indices to measure decentralization, evaluates scalability through transactional data, and assesses security by identifying potential vulnerabilities. Utilizing real-world data, we analyze each platform's strategies in a structured manner to understand their effectiveness in addressing trilemma challenges. The findings highlight each platform's strengths and propose general methodologies for evaluating key blockchain characteristics applicable to other systems. This research advances the understanding of blockchain technologies and their implications for the future digital economy. Data and code are available on GitHub as open source.


[23] 2407.16020

Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

The advent of quantum computing holds the potential to revolutionize various fields by solving complex problems more efficiently than classical computers. Despite this promise, practical quantum advantage is hindered by current hardware limitations, notably the small number of qubits and high noise levels. In this study, we leverage adiabatic quantum computers to optimize Kolmogorov-Arnold Networks, a powerful neural network architecture for representing complex functions with minimal parameters. By modifying the network to use Bezier curves as the basis functions and formulating the optimization problem into a Quadratic Unconstrained Binary Optimization problem, we create a fixed-sized solution space, independent of the number of training samples. Our approach demonstrates sparks of quantum advantage through faster training times compared to classical optimizers such as the Adam, Stochastic Gradient Descent, Adaptive Gradient, and simulated annealing. Additionally, we introduce a novel rapid retraining capability, enabling the network to be retrained with new data without reprocessing old samples, thus enhancing learning efficiency in dynamic environments. Experimental results on initial training of classification and regression tasks validate the efficacy of our approach, showcasing significant speedups and comparable performance to classical methods. While experiments on retraining demonstrate a sixty times speed up using adiabatic quantum computing based optimization compared to that of the gradient descent based optimizers, with theoretical models allowing this speed up to be even larger! Our findings suggest that with further advancements in quantum hardware and algorithm optimization, quantum-optimized machine learning models could have broad applications across various domains, with initial focus on rapid retraining.


[24] 2407.17518

Driving pattern interpretation based on action phases clustering

Current approaches to identifying driving heterogeneity face challenges in comprehending fundamental patterns from the perspective of underlying driving behavior mechanisms. The concept of Action phases was proposed in our previous work, capturing the diversity of driving characteristics with physical meanings. This study presents a novel framework to further interpret driving patterns by classifying Action phases in an unsupervised manner. In this framework, a Resampling and Downsampling Method (RDM) is first applied to standardize the length of Action phases. Then the clustering calibration procedure including ''Feature Selection'', ''Clustering Analysis'', ''Difference/Similarity Evaluation'', and ''Action phases Re-extraction'' is iteratively applied until all differences among clusters and similarities within clusters reach the pre-determined criteria. Application of the framework using real-world datasets revealed six driving patterns in the I80 dataset, labeled as ''Catch up'', ''Keep away'', and ''Maintain distance'', with both ''Stable'' and ''Unstable'' states. Notably, Unstable patterns are more numerous than Stable ones. ''Maintain distance'' is the most common among Stable patterns. These observations align with the dynamic nature of driving. Two patterns ''Stable keep away'' and ''Unstable catch up'' are missing in the US101 dataset, which is in line with our expectations as this dataset was previously shown to have less heterogeneity. This demonstrates the potential of driving patterns in describing driving heterogeneity. The proposed framework promises advantages in addressing label scarcity in supervised learning and enhancing tasks such as driving behavior modeling and driving trajectory prediction.


[25] 2407.17522

Mapping the Technological Future: A Topic, Sentiment, and Emotion Analysis in Social Media Discourse

People worldwide are currently confronted with a number of technological challenges, which act as a potent source of uncertainty. The uncertainty arising from the volatility and unpredictability of technology (such as AI) and its potential consequences is widely discussed on social media. This study uses BERTopic modelling along with sentiment and emotion analysis on 1.5 million tweets from 2021 to 2023 to identify anticipated tech-driven futures and capture the emotions communicated by 400 key opinion leaders (KOLs). Findings indicate positive sentiment significantly outweighs negative, with a prevailing dominance of positive anticipatory emotions. Specifically, the 'Hope' score is approximately 10.33\% higher than the median 'Anxiety' score. KOLs emphasize 'Optimism' and benefits over 'Pessimism' and challenges. The study emphasizes the important role KOLs play in shaping future visions through anticipatory discourse and emotional tone during times of technological uncertainty.


[26] 2407.17565

Periodicity significance testing with null-signal templates: reassessment of PTF's SMBH binary candidates

Periodograms are widely employed for identifying periodicity in time series data, yet they often struggle to accurately quantify the statistical significance of detected periodic signals when the data complexity precludes reliable simulations. We develop a data-driven approach to address this challenge by introducing a null-signal template (NST). The NST is created by carefully randomizing the period of each cycle in the periodogram template, rendering it non-periodic. It has the same frequentist properties as a periodic signal template regardless of the noise probability distribution, and we show with simulations that the distribution of false positives is the same as with the original periodic template, regardless of the underlying data. Thus, performing a periodicity search with the NST acts as an effective simulation of the null (no-signal) hypothesis, without having to simulate the noise properties of the data. We apply the NST method to the supermassive black hole binaries (SMBHB) search in the Palomar Transient Factory (PTF), where Charisi et al. had previously proposed 33 high signal to (white) noise candidates utilizing simulations to quantify their significance. Our approach reveals that these simulations do not capture the complexity of the real data. There are no statistically significant periodic signal detections above the non-periodic background. To improve the search sensitivity we introduce a Gaussian quadrature based algorithm for the Bayes Factor with correlated noise as a test statistic, in contrast to the standard signal to white noise. We show with simulations that this improves sensitivity to true signals by more than an order of magnitude. However, using the Bayes Factor approach also results in no statistically significant detections in the PTF data.


[27] 2407.17654

Generative Learning for Simulation of US Army Vehicle Faults

We develop a novel generative model to simulate vehicle health and forecast faults, conditioned on practical operational considerations. The model, trained on data from the US Army's Predictive Logistics program, aims to support predictive maintenance. It forecasts faults far enough in advance to execute a maintenance intervention before a breakdown occurs. The model incorporates real-world factors that affect vehicle health. It also allows us to understand the vehicle's condition by analyzing operating data, and characterizing each vehicle into discrete states. Importantly, the model predicts the time to first fault with high accuracy. We compare its performance to other models and demonstrate its successful training.


[28] 2407.17686

Transformers on Markov Data: Constant Depth Suffices

Attention-based transformers have been remarkably successful at modeling generative processes across various domains and modalities. In this paper, we study the behavior of transformers on data drawn from \kth Markov processes, where the conditional distribution of the next symbol in a sequence depends on the previous $k$ symbols observed. We observe a surprising phenomenon empirically which contradicts previous findings: when trained for sufficiently long, a transformer with a fixed depth and $1$ head per layer is able to achieve low test loss on sequences drawn from \kth Markov sources, even as $k$ grows. Furthermore, this low test loss is achieved by the transformer's ability to represent and learn the in-context conditional empirical distribution. On the theoretical side, our main result is that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for \kth Markov sources, concurring with our empirical observations. Along the way, we prove that \textit{attention-only} transformers with $O(\log_2(k))$ layers can represent the in-context conditional empirical distribution by composing induction heads to track the previous $k$ symbols in the sequence. These results provide more insight into our current understanding of the mechanisms by which transformers learn to capture context, by understanding their behavior on Markov sources.


[29] 2407.17697

Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks

This study introduces novel superior scoring rules called Penalized Brier Score (PBS) and Penalized Logarithmic Loss (PLL) to improve model evaluation for probabilistic classification. Traditional scoring rules like Brier Score and Logarithmic Loss sometimes assign better scores to misclassifications in comparison with correct classifications. This discrepancy from the actual preference for rewarding correct classifications can lead to suboptimal model selection. By integrating penalties for misclassifications, PBS and PLL modify traditional proper scoring rules to consistently assign better scores to correct predictions. Formal proofs demonstrate that PBS and PLL satisfy strictly proper scoring rule properties while also preferentially rewarding accurate classifications. Experiments showcase the benefits of using PBS and PLL for model selection, model checkpointing, and early stopping. PBS exhibits a higher negative correlation with the F1 score compared to the Brier Score during training. Thus, PBS more effectively identifies optimal checkpoints and early stopping points, leading to improved F1 scores. Comparative analysis verifies models selected by PBS and PLL achieve superior F1 scores. Therefore, PBS and PLL address the gap between uncertainty quantification and accuracy maximization by encapsulating both proper scoring principles and explicit preference for true classifications. The proposed metrics can enhance model evaluation and selection for reliable probabilistic classification.


[30] 2407.17781

Integrating Ensemble Kalman Filter with AI-based Weather Prediction Model ClimaX

Artificial intelligence (AI)-based weather prediction research is growing rapidly and has shown to be competitive with the advanced dynamic numerical weather prediction models. However, research combining AI-based weather prediction models with data assimilation remains limited partially because long-term sequential data assimilation cycles are required to evaluate data assimilation systems. This study explores integrating the local ensemble transform Kalman filter (LETKF) with an AI-based weather prediction model ClimaX. Our experiments demonstrated that the ensemble data assimilation cycled stably for the AI-based weather prediction model using covariance inflation and localization techniques inside the LETKF. While ClimaX showed some limitations in capturing flow-dependent error covariance compared to dynamical models, the AI-based ensemble forecasts provided reasonable and beneficial error covariance in sparsely observed regions. These findings highlight the potential of AI models in weather forecasting and the importance of physical consistency and accurate error growth representation in improving ensemble data assimilation.


[31] 2407.17888

Enhanced power enhancements for testing many moment equalities: Beyond the $2$- and $\infty$-norm

Tests based on the $2$- and $\infty$-norm have received considerable attention in high-dimensional testing problems, as they are powerful against dense and sparse alternatives, respectively. The power enhancement principle of Fan et al. (2015) combines these two norms to construct tests that are powerful against both types of alternatives. Nevertheless, the $2$- and $\infty$-norm are just two out of the whole spectrum of $p$-norms that one can base a test on. In the context of testing whether a candidate parameter satisfies a large number of moment equalities, we construct a test that harnesses the strength of all $p$-norms with $p\in[2, \infty]$. As a result, this test consistent against strictly more alternatives than any test based on a single $p$-norm. In particular, our test is consistent against more alternatives than tests based on the $2$- and $\infty$-norm, which is what most implementations of the power enhancement principle target. We illustrate the scope of our general results by using them to construct a test that simultaneously dominates the Anderson-Rubin test (based on $p=2$) and tests based on the $\infty$-norm in terms of consistency in the linear instrumental variable model with many (weak) instruments.


[32] 2407.18176

Euler Stratifications of Hypersurface Families

We stratify families of projective and very affine hypersurfaces according to their topological Euler characteristic. Our new algorithms compute all strata using algebro-geometric techniques. For very affine hypersurfaces, we investigate and exploit the relation to critical point computations. Euler stratifications are relevant in particle physics and algebraic statistics. They fully describe the dependence of the number of master integrals, respectively the maximum likelihood degree, on kinematic or model parameters.