The growing adoption of large language models (LLMs) in finance exposes high-stakes decision-making to subtle, underexamined positional biases. The complexity and opacity of modern model architectures compound this risk. We present the first unified framework and benchmark that not only detects and quantifies positional bias in binary financial decisions but also pinpoints its mechanistic origins within open-source Qwen2.5-instruct models (1.5B--14B). Our empirical analysis covers a novel, finance-authentic dataset revealing that positional bias is pervasive, scale-sensitive, and prone to resurfacing under nuanced prompt designs and investment scenarios, with recency and primacy effects revealing new vulnerabilities in risk-laden contexts. Through transparent mechanistic interpretability, we map how and where bias emerges and propagates within the models to deliver actionable, generalizable insights across prompt types and scales. By bridging domain-specific audit with model interpretability, our work provides a new methodological standard for both rigorous bias diagnosis and practical mitigation, establishing essential guidance for responsible and trustworthy deployment of LLMs in financial systems.
This paper proposes a novel stock selection strategy framework based on combined machine learning algorithms. Two types of weighting methods for three representative machine learning algorithms are developed to predict the returns of the stock selection strategy. One is static weighting based on model evaluation metrics, the other is dynamic weighting based on Information Coefficients (IC). Using CSI 300 index data, we empirically evaluate the strategy' s backtested performance and model predictive accuracy. The main results are as follows: (1) The strategy by combined machine learning algorithms significantly outperforms single-model approaches in backtested returns. (2) IC-based weighting (particularly IC_Mean) demonstrates greater competitiveness than evaluation-metric-based weighting in both backtested returns and predictive performance. (3) Factor screening substantially enhances the performance of combined machine learning strategies.
Environmental, Social, and Governance (ESG) factors aim to provide non-financial insights into corporations. In this study, we investigate whether we can extract relevant ESG variables to assess corporate risk, as measured by logarithmic volatility. We propose a novel Hierarchical Variable Selection (HVS) algorithm to identify a parsimonious set of variables from raw data that are most relevant to risk. HVS is specifically designed for ESG datasets characterized by a tree structure with significantly more variables than observations. Our findings demonstrate that HVS achieves significantly higher performance than models using pre-aggregated ESG scores. Furthermore, when compared with traditional variable selection methods, HVS achieves superior explanatory power using a more parsimonious set of ESG variables. We illustrate the methodology using company data from various sectors of the US economy.
The Kelly criterion provides a general framework for optimizing the growth rate of an investment portfolio over time by maximizing the expected logarithmic utility of wealth. However, the optimality condition of the Kelly criterion is highly sensitive to accurate estimates of the probabilities and investment payoffs. Estimation risk can lead to greatly suboptimal portfolios. In a simple binomial model, we show that the introduction of a European option in the Kelly framework can be used to construct a class of growth optimal portfolios that are robust to estimation risk.
We describe a Matlab routine that allows us to estimate the jumps in financial asset prices using the Threshold (or Truncation) method of Mancini (2009). The routine is designed for application to five-minute log-returns. The underlying assumption is that asset prices evolve in time following an Ito semimartingale with, possibly stochastic, volatility and jumps. A log-return is likely to contain a jump if its absolute value is larger than a threshold determined by the maximum increment of the Brownian semimartingale part. The latter is particularly sensitive to the magnitude of the volatility coefficient, and from an empirical point of view, volatility levels typically depend on the time of day (TOD), with volatility being highest at the beginning and end of the day, while it is low in the middle. The first routine presented allows for an estimation of the TOD effect, and is an implementation of the method described in Bollerslev and Todorov (2011). Subsequently, the TOD effect for the stock Apple Inc. (AAPL) is visualized. The second routine presented is an implementation of the threshold method for estimating jumps in AAPL prices. The procedure recursively estimates daily volatility and jumps. In each round, the threshold depends on the time of the day and is constructed using the estimate of the daily volatility multiplied by the daytime TOD factor and by the continuity modulus of the Brownian motion paths. Once the jumps are detected, the daily volatility estimate is updated using only the log-returns not containing jumps. Before application to empirical data, the reliability of the procedure was separately tested on simulated asset prices. The results obtained on a record of AAPL stock prices are visualized.
This study evaluates deep neural networks for forecasting probability distributions of financial returns. 1D convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) architectures are used to forecast parameters of three probability distributions: Normal, Student's t, and skewed Student's t. Using custom negative log-likelihood loss functions, distribution parameters are optimized directly. The models are tested on six major equity indices (S\&P 500, BOVESPA, DAX, WIG, Nikkei 225, and KOSPI) using probabilistic evaluation metrics including Log Predictive Score (LPS), Continuous Ranked Probability Score (CRPS), and Probability Integral Transform (PIT). Results show that deep learning models provide accurate distributional forecasts and perform competitively with classical GARCH models for Value-at-Risk estimation. The LSTM with skewed Student's t distribution performs best across multiple evaluation criteria, capturing both heavy tails and asymmetry in financial returns. This work shows that deep neural networks are viable alternatives to traditional econometric models for financial risk assessment and portfolio management.
Almost all countries in the world require banks to report suspicious transactions to national authorities. The reports are known as suspicious transaction or activity reports (we use the former term) and are intended to help authorities detect and prosecute money laundering. In this paper, we investigate the relationship between suspicious transaction reports and convictions for money laundering in the European Union. We use publicly available data from Europol, the World Bank, the International Monetary Fund, and the European Sourcebook of Crime and Criminal Justice Statistics. To analyze the data, we employ a log-transformation and fit pooled (i.e., ordinary least squares) and fixed effects regression models. The fixed effects models, in particular, allow us to control for unobserved country-specific confounders (e.g., different laws regarding when and how reports should be filed). Initial results indicate that the number of suspicious transaction reports and convictions for money laundering in a country follow a sub-linear power law. Thus, while more reports may lead to more convictions, their marginal effect decreases with their amount. The relationship is robust to control variables such as the size of shadow economies and police forces. However, when we include time as a control, the relationship disappears in the fixed effects models. This suggests that the relationship is spurious rather than causal, driven by cross-country differences and a common time trend. In turn, a country cannot, ceteris paribus and with statistical confidence, expect that an increase in suspicious transaction reports will drive an increase in convictions. Our results have important implications for international anti-money laundering efforts and policies. (...)
This study investigates the pretrained RNN attention models with the mainstream attention mechanisms such as additive attention, Luong's three attentions, global self-attention (Self-att) and sliding window sparse attention (Sparse-att) for the empirical asset pricing research on top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning (ML) based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19 (mild uptrend), COVID-19 (steep uptrend with a large drawdown) and one year post-COVID-19 (sideways movement with high fluctuations), for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, Model Self-att and Model Sparse-att exhibit great capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19. And Model Sparse-att performs more stably than Model Self-att from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.
Leveraging Tennessee's 2005 Medicaid contraction, I study the impact of losing public health insurance on body weight and relevant health behaviors. Using Behavioral Risk Factor Surveillance System (BRFSS) data from 1997 to 2010, I estimate synthetic difference-in-differences models. The estimates suggest that the reform increased Body Mass Index by 0.38 points and the overweight or obesity prevalence (BMI$\geq$25) by $\sim$4\% among Tennessean childless adults. My findings -- a 21\% increase in the share of childless adults reporting ``poor'' health status (the lowest level on the five-point scale), a reduction in Medicaid-reimbursed utilization of pain and anti-inflammatory medications, and a reduction in participation in moderate physical activities -- suggest that worsening unmanaged health conditions may be a key pathway through which coverage loss affected weight gain. Additionally, my analysis offers practical guidance for conducting robust inference in single treated cluster settings with limited pre-treatment data.
Ride-hailing platforms (e.g., Uber, Lyft) have transformed urban mobility by enabling ride-sharing, which holds considerable promise for reducing both travel costs and total vehicle miles traveled (VMT). However, the fragmentation of these platforms impedes system-wide efficiency by restricting ride-matching to intra-platform requests. Cross-platform collaboration could unlock substantial efficiency gains, but its realization hinges on fair and sustainable profit allocation mechanisms that can align the incentives of competing platforms. This study introduces a graph-theoretic framework that embeds profit-aware constraints into network optimization, facilitating equitable and efficient cross-platform ride-sharing. Within this framework, we evaluate three allocation schemes -- equal-profit-based, market-share-based, and Shapley-value-based -- through large-scale simulations. Results show that the Shapley-value-based mechanism consistently outperforms the alternatives across six key metrics. Notably, system efficiency and rider service quality improve with increasing demand, reflecting clear economies of scale. The observed economies of scale, along with their diminishing returns, can be understood with the structural evolution of rider-request graphs, where super-linear edge growth expands feasible matches and sub-linear degree scaling limits per-rider connectivity.
Large language models (LLMs) are increasingly used to simulate human decision-making, but their intrinsic biases often diverge from real human behavior--limiting their ability to reflect population-level diversity. We address this challenge with a persona-based approach that leverages individual-level behavioral data from behavioral economics to adjust model biases. Applying this method to the ultimatum game--a standard but difficult benchmark for LLMs--we observe improved alignment between simulated and empirical behavior, particularly on the responder side. While further refinement of trait representations is needed, our results demonstrate the promise of persona-conditioned LLMs for simulating human-like decision patterns at scale.
In the financial system, bailout strategies play a pivotal role in mitigating substantial losses resulting from systemic risk. However, the lack of a closed-form objective function to the optimal bailout problem poses significant challenges in its resolution. This paper conceptualizes the optimal bailout (capital injection) problem as a black-box optimization task, where the black box is modeled as a fixed-point system consistent with the E-N framework for measuring systemic risk in the financial system. To address this challenge, we propose a novel framework, "Prediction-Gradient-Optimization" (PGO). Within PGO, the Prediction employs a neural network to approximate and forecast the objective function implied by the black box, which can be completed offline; For the online usage, the Gradient step derives gradient information from this approximation, and the Optimization step uses a gradient projection algorithm to solve the problem effectively. Extensive numerical experiments highlight the effectiveness of the proposed approach in managing systemic risk.
As financial markets grow increasingly complex in the big data era, accurate stock prediction has become more critical. Traditional time series models, such as GRUs, have been widely used but often struggle to capture the intricate nonlinear dynamics of markets, particularly in the flexible selection and effective utilization of key historical information. Recently, methods like Graph Neural Networks and Reinforcement Learning have shown promise in stock prediction but require high data quality and quantity, and they tend to exhibit instability when dealing with data sparsity and noise. Moreover, the training and inference processes for these models are typically complex and computationally expensive, limiting their broad deployment in practical applications. Existing approaches also generally struggle to capture unobservable latent market states effectively, such as market sentiment and expectations, microstructural factors, and participant behavior patterns, leading to an inadequate understanding of market dynamics and subsequently impact prediction accuracy. To address these challenges, this paper proposes a stock prediction model, MCI-GRU, based on a multi-head cross-attention mechanism and an improved GRU. First, we enhance the GRU model by replacing the reset gate with an attention mechanism, thereby increasing the model's flexibility in selecting and utilizing historical information. Second, we design a multi-head cross-attention mechanism for learning unobservable latent market state representations, which are further enriched through interactions with both temporal features and cross-sectional features. Finally, extensive experiments on four main stock markets show that the proposed method outperforms SOTA techniques across multiple metrics. Additionally, its successful application in real-world fund management operations confirms its effectiveness and practicality.
Maximizing revenue for grid-scale battery energy storage systems in continuous intraday electricity markets requires strategies that are able to seize trading opportunities as soon as new information arrives. This paper introduces and evaluates an automated high-frequency trading strategy for battery energy storage systems trading on the intraday market for power while explicitly considering the dynamics of the limit order book, market rules, and technical parameters. The standard rolling intrinsic strategy is adapted for continuous intraday electricity markets and solved using a dynamic programming approximation that is two to three orders of magnitude faster than an exact mixed-integer linear programming solution. A detailed backtest over a full year of German order book data demonstrates that the proposed dynamic programming formulation does not reduce trading profits and enables the policy to react to every relevant order book update, enabling realistic rapid backtesting. Our results show the significant revenue potential of high-frequency trading: our policy earns 58% more than when re-optimizing only once every hour and 14% more than when re-optimizing once per minute, highlighting that profits critically depend on trading speed. Furthermore, we leverage the speed of our algorithm to train a parametric extension of the rolling intrinsic, increasing yearly revenue by 8.4% out of sample.
We provide a simple and straightforward approach to a continuous-time version of Cover's universal portfolio strategies within the model-free context of Föllmer's pathwise Itô calculus. We establish the existence of the universal portfolio strategy and prove that its portfolio value process is the average of all values of constant rebalanced strategies. This result relies on a systematic comparison between two alternative descriptions of self-financing trading strategies within pathwise Itô calculus. We moreover provide a comparison result for the performance and the realized volatility and variance of constant rebalanced portfolio strategies.