Macro‑Dipole‑Constrained Learning of Atomic Charges
for Accurate Electrostatic Potentials at Electrochemical Interfaces
October 01, 2025
Large thermal fluctuations of the liquid phase obscure the weak macroscopic electric field that drives electrochemical reactions, rendering the extraction of reliable interfacial charge distributions from ab initio molecular dynamics extremely challenging. We introduce SMILE‑CP (Scalar Macro‑dipole Integrated LEarning – Charge Partitioning), a macro‑dipole‑constrained scheme that infers atomic charges using only the instantaneous atomic coordinates and the total dipole moment of the simulation cell — quantities routinely available from standard density‑functional theory calculations. SMILE‑CP preserves both the global electrostatic field and the local potential without invoking any explicit charge‑partitioning scheme. Benchmarks on three representative electrochemical interfaces — nanoconfined water, Mg\(^{2+}\) dissolution in water, and a kinked Mg vicinal surface under anodic bias — show that SMILE‑CP eliminates the qualitative errors observed for unconstrained charge decompositions. The method is computationally inexpensive and data‑efficient, opening the door to charge‑aware machine‑learning potentials capable of bias‑controlled, nanosecond‑scale simulations of realistic electrochemical systems.
Computational electrochemistry plays a crucial role in understanding and designing electrochemical processes at the atomic scale, including reactions at electrode interfaces, ion transport, and charge transfer phenomena [1]–[4]. While density functional theory (DFT) provides an accurate quantum mechanical description, its high computational cost severely limits the accessible length and time scales, impeding simulations of realistic systems and dynamics. On the other hand, classical force fields enable larger-scale simulations but often rely on fixed or simplified electrostatic models, lacking the flexibility to capture complex, environment-dependent charge distributions that are needed for modeling electrochemical interfaces, especially charge transfer reactions.
In recent years, machine learning interatomic potentials (MLIPs) have emerged as promising tools that combine near-DFT accuracy with significantly reduced computational cost [5]–[7]. However, the most prevalent types of MLIPs [8]–[14] determine the forces and energies of a given atom uniquely by its local atomistic environment within a certain cutoff. This inherent short-sightedness of the descriptors renders the models incapable of capturing long-range interactions accurately, limiting their applicability. Hence, incorporating long-range electrostatic interactions in MLIPs has been a major focus of ongoing research [15]–[21].
A key quantity in electrochemistry is the electrostatic potential \(\phi(z)\), which describes the variation of the electric potential as a function of position perpendicular to the electrode–electrolyte interface. This potential governs the distribution of ions, influences charge transfer processes, and plays a central role in determining interfacial properties such as the double layer structure and the driving force for electrochemical reactions. An accurate reproduction of \(\phi\) is therefore essential for realistically modeling the behavior of electrochemical systems at the atomic scale. However, none of the present MLIP approaches takes the electrostatic potential \(\phi\) directly as a target property due to its inherent incompatibility with standard ML framework. Unlike atomic energies and forces, \(\phi\) is a continuous scalar field over 3D space computed as the solution to Poisson’s equation. Numerically, computing gradients of \(\phi\) with respect to model parameters is impractical due to the nonlocality of the Poisson operator and therefore \(\phi\) is ill-suited to be included in the loss function of a ML model. Instead, one commonly explored approach is to train the model on a local charge partition scheme and include the long-range electrostatic interactions computed from the local charge distribution. For example, Hirshfeld charges [22] are used as reference for training the fourth-generation high-dimensional neutral network potentials [17], [23], [24]. The deep potential (DP) model [18] is trained on the centers of maximally localized Wannier functions (Wannier centers, WCs) [25], [26] to calculate the corresponding long-range interaction. These MLIPs are used for simulating systems containing charge species and electric field [27]–[29] and for modeling dielectric response in polarizable media [30]–[33].
In this work, we demonstrate that machine learning models that are solely trained to reproduce local charge decomposition schemes can result in significant errors in the long-range electrostatic potential \(\phi(z)\). This shortcoming limits their application to electrochemical systems, in which charge transfer processes are strongly influenced by the macroscopic electric field. To address this issue, we introduce two innovations: (1) incorporating the macro dipole moment of the system as a surrogate for the electric field in the model’s cost function to ensure accurate reconstruction of the long-range electrostatic profile and (2) explicitly accounting for the electronic polarization of water, which escapes standard machine learning potentials because of its relatively small magnitude in comparison to the intrinsic thermal fluctuation. The proposed approach is termed SMILE‑CP (Scalar Macro‑dipole Integrated LEarning – Charge Partitioning).
To analyze and quantify the impact of large local fluctuations in the electrostatic potential on the accurate modeling of macroscopic fields in electrochemical systems, we have constructed a well-defined computational setup. This setup comprises a nanoconfined water slab that is sandwiched between two oppositely charged neon (Ne) electrodes (Fig. 1a) [34], [35]. This setup realistically describes local fluctuations at an electrochemical interface while providing full quantitative access to and control over the macroscopic field. Thus, it provides an ideal testbed for evaluating the capacity of local ML models to capture macroscopic electrostatics amidst pronounced microscopic disorder, a challenge that is ubiquitous in realistic electrochemical interfaces. The net electrode charge, and thus the applied electric field across the water layer, is controlled by adjusting the Ne core charges while maintaining the cell’s overall charge neutrality. We impose a dipole correction along the surface normal (\(z\)) direction to ensure accurate long-range electrostatics [36]. Ab initio molecular dynamics (AIMD) simulations are performed using the Vienna Ab Initio Simulation Package (VASP) [37], [38]. A charge decomposition is performed on a selected set of MD snapshots using Wannier90 [39], and Bader charge decomposition [40], [41]; full computational details are provided in the supplemental material (SM) [42].
Fig. 1b shows the electrostatic potential profile \(\phi\), along the \(z\) axis for a representative AIMD snapshot at an applied field of \(E^\mathrm{ext} =\) 0.2 V/\(\mathrm{\AA}\) (black line), compared with the zero-field case (grey dashed line). The close-to-linear dependence observed implies a nearly constant field in the water region. This is confirmed in Fig. 1c by plotting the difference potential, \(\Delta\phi\), between the two field conditions. Bader charges fail to reproduce the macro dipole (see SM). Only the maximally localized Wannier functions accurately reproduce the macroscopic field from the WCs of individual water molecules (Fig. 1b, green line), while small local deviations reflect the Gaussian approximation of the electron density.
In an ML approach, however, the WCs can no longer be obtained from electronic wave functions but exclusively from the local environment of each atom. We hence develop and apply a ML model (ML\(^{\mathrm{WC}}\)) based on local atomic cluster expansion (ACE) descriptors [13], [43]–[45] which predicts the WC positions of individual water molecules, in the same spirit of the DP potential with long-range electrostatic interaction [18]. The ML\(^{\mathrm{WC}}\) model achieves state-of-the-art accuracy in WC positions with a mean squared root error (RMSE) of 0.004 \(\mathrm{\AA}\) [28], [29]. However, despite this impressive local accuracy of 4 pm, the model systematically fails to reproduce the correct macroscopic field response, instead predicting an offset in the potential (Fig. 1b, red line). This failure is generic: it arises because large local fluctuations, which are inherent to electrochemical systems, decouple precise local charge prediction from accurate long-range electrostatics. Our finding thus reveals a fundamental limitation of strictly local ML approaches in capturing macroscopic electrostatic behavior in complex interfacial systems.
To elucidate the origin of this failure, we analyze how individual water dipoles respond to the applied electric field. Specifically, we define the electron charge center of each water molecule as the center of mass of its WCs (\(\mathbf{R}^\mathrm{el}\)), and calculate the molecular dipole moment as \(\boldsymbol{\mu}^\mathrm{H_2O} = 8e(\mathbf{R^{\mathrm{el}}} - \mathbf{R^{\mathrm{core}}})\), where \(\mathbf{R^{\mathrm{core}}}\) denotes the core position (see schematic in Fig. 2a). Upon application of an external field, the electron cloud of each water molecule is polarized, leading to a shift in the WC positions and a corresponding change in \(\boldsymbol{\mu}^\mathrm{H_2O}\).
By tracking changes in the WC positions under varying external fields, we compute the dipole moment distributions for all water molecules. Fig. 2b displays both the individual dipole moment projections along \(z\) (\(\mu_z^\mathrm{H_2O}\), light blue dots) and their sum (\(\mu_z^\mathrm{H_2O\text{-}tot}\), red dots) as a function of the applied field \(E^\mathrm{ext}\). While the total dipole response exhibits a clear linear dependence on \(E^\mathrm{ext}\)—in agreement with the expected global polarization—the distributions of individual \(\mu_z^\mathrm{H_2O}\) remain essentially unchanged. This is further underscored by the nearly horizontal trend of the per-molecule-averaged \(\mu_z^\mathrm{H_2O}\) (dark blue dots in Fig. 2b).
This seemingly contradictory observation can be explained quantitatively. For a net change of 1 \(e\cdot\mathrm{\AA}\) in the global dipole moment distributed across the 64 water molecules in our system, the average change in the individual molecular dipole \(\mu_z^\mathrm{H_2O}\) is just 0.016 \(e\cdot\mathrm{\AA}\), two orders of magnitude smaller than the thermal fluctuations in \(\mu_z^\mathrm{H_2O}\)present in the system. This disparity presents a fundamental challenge for ML models trained to predict \(\mathbf{R}^\mathrm{el}\): the relevant signal—the material’s response to an external field—is almost entirely obscured by the intrinsic noise due to thermal motion and orientational disorder of water molecules. For WCs, this translates to a change of only 0.002 \(\mathrm{\AA}\) change in \(\mathbf{R}^\mathrm{el}\), well below the resolution achievable by local ML models. This limitation persists even at zero external field, due to the long-range field arising from spontaneous dipole fluctuations, and cannot be cured by adjusting model hyperparameters (see SM [42]).
Since the local descriptor entering the ML\(^{\mathrm{WC}}\) model is not explicitly dependent on the macroscopic field, these models can only describe the local ionic screening but fail to capture the electronic polarization. Thus, in the absence of ionic screening (i.e. freezing in an AIMD snapshot) and applying an external field, the effective field in the water region \(E_{\mathrm{water}}\approx E^\mathrm{ext}\). Indeed, as shown in Fig. 1b, the internal electric field predicted by ML\(^{\mathrm{WC}}\) is nearly twice that from DFT, corresponding to the missing dielectric screening factor (\(\varepsilon_{\infty} \approx 2\)). These limitations apply generally to all ML models that use local charges as input.
Given the failure of the local ML model to capture the long-range electrostatics, we propose an approach that extracts the local charges directly from the total macroscopic dipole moment \(\mu^\mathrm{tot}_z\). In this approach, the dipole moment of each water molecule \(\boldsymbol{\mu_i}\) is a function of its local descriptor \(\mathbf{D}^i\), such that \[\boldsymbol{\mu}^{i} = f(\mathbf{D}^i). \label{eq:model}\tag{1}\] The sum of the predicted molecular dipoles has to match the total dipole along \(z\) axis: \[\sum_i\mu^i_z = \mu_z^\mathrm{tot}, \label{eq:macro}\tag{2}\] analogous to the procedure used in MLIPs, where atomic energies are learned subject to the constraint \(\sum_i E^i = E^\mathrm{tot}\). We denote the ML model employing Eq. 2 as SMILE0. Since in this approach the charge decomposition is done with the condition to minimize errors in the macro dipole, this schema guarantees an accurate description of the macroscopic field.
However, when applying this approach to a more realistic system with dissolved Mg\(^{2+}\) ion and charged electrodes (Fig. 3a), severe deviations between the electrostatic potential obtained by DFT (Fig. 3b, black line) and the SMILE0 model (Fig. 3b, orange line) are observed: while DFT shows strong screening as a nearly flat potential in the long range, the SMILE0 model predicts a pronounced field and a valley-shaped potential, indicative of underestimated screening.
This failure arises from the absence of electronic polarization in typical ML models. Conceptually, the total macro dipole can be separated into an electronic polarization term \(\mu_z^\mathrm{el\text{-}pol}\), that is field-induced, and a local term \(\mu_i^\mathrm{local}\), that is field-independent: \[\mu^\mathrm{tot}_z = \mu^\mathrm{el\text{-}pol}_z + \mu_z^\mathrm{local}. \label{eq:sum}\tag{3}\] In the linear response regime, \(\mu_z^\mathrm{el\text{-}pol} = -\chi \mu_z^\mathrm{tot}\), with \(\chi\) being the electronic susceptibility (\(\chi = \varepsilon_\infty-1\)). The induced electronic polarization dipole counterbalances the applied field, therefore \(\mu_z^\mathrm{el\text{-}pol}\) and \(\mu_z^\mathrm{tot}\) have opposite signs. Thus, \(\mu^\mathrm{tot}_z = \mu^\mathrm{local}_z/(1+\chi)\), showing that the observed macro dipole is only a screened fraction of the static local dipoles, leading to a systematic underestimation of screening. Fig. 3c confirms this, with the DFT-derived WC dipole distribution averaged at 0.61 \(e\cdot \mathrm{\AA}\), while SMILE0 yields 0.38 \(e\cdot \mathrm{\AA}\).
To account for the electronic polarization, we exploit the fact that in contrast to the ionic dielectric constant, which shows huge spatial fluctuations at the electrochemical interface, electronic polarization is remarkably homogeneous (see Fig. 1c). Based on this insight, we reformulate the constraint in Eq. 2 as \[\sum_i \mu^{i, \mathrm{local}}_z = (1+\chi)\mu^\mathrm{tot}_z, \label{eq:local}\tag{4}\] and denote this model as SMILE. Varying \(\chi\), we find the RMSE between reference and predicted dipoles is minimized at \(\chi\) \(\approx\) 1.2 (Fig. 3d), consistent across several water-solid interface systems. This optimal value slightly exceeds the DFT-calculated bulk value (\(\chi_{\mathrm{DFT}}^{\mathrm{bulk}}\) = 0.96), likely due to a reduced HOMO-LUMO gap of water in the interface region resulting in enhanced dielectric response [46].
With this correction, the SMILE model faithfully reproduces both the local dipole distribution (average 0.61 \(e\cdot \mathrm{\AA}\), Fig. 3c) and the long-range electrostatics (Fig. 3b, green line), achieving an RMSE of 0.046 \(e\cdot \mathrm{\AA}\) for the macro dipole and 0.056 \(e\cdot \mathrm{\AA}\) for individual water dipoles. These results demonstrate the model’s strong predictive power for both local and macroscopic electrostatics, resolving the inherent shortcomings of strictly local ML approaches.
Another advantage of the proposed model is that it is flexible for learning the local dipole or the local charge. The latter assigns a point charge to each atom in the system. Practically, this is done by rewriting Eq. 1 and 4 as \[q_i = f(\mathbf{D}^i),\] \[\sum_i q_iz_i = (1+\chi)\mu^\mathrm{tot}_z. \label{eq:q}\tag{5}\] Here \(q_i\) is the charge on atom \(i\) and \(z_i\) is the \(z\) coordinate of the corresponding atom. The model assigning atomic charges is arguably more flexible than the model predicting molecular dipoles, because by assigning a dipole moment to each water molecule, the dipole-based model implicitly assumes charge neutrality for the individual molecule, thereby excluding intermolecular charge transfer. We apply the model employing Eq. 5 to a realistic electrochemical system, a Mg(12\(\bar{3}\)5) vicinal surface under anodic potential (Fig. 4a). The Mg/water interface is of particular interest due to the observed anomalous anodic hydrogen evolution reaction [47], [48], and in recent years there have been a number of computational studies attempting to reveal the reaction pathway [34], [49]–[51]. In this study, we employ a static potentiostat to apply anodic bias on the Mg surface, similar to the method in [34]. The evolution of the macro dipole and the electrode charge is shown in Fig. 4b. The first 5 ps of the AIMD run is under open circuit condition, with no charge on the electrode, then an anodic bias is applied, which is gradually ramped up to 4 V.
The SMILE model consistently reproduces the electrostatic potential both under open circuit condition and under anodic bias (Fig. 4c). The long AIMD trajectory allows us to average out the local fluctuations in \(\phi\) and clearly observe the macroscopic field across the cell. While the \(\phi\) profile is almost flat in the bulk of water under open circuit condition, the field under the 4V anodic bias averages to 0.15 V/\(\mathrm{\AA}\). These long-range macroscopic fields are accurately captured by the SMILE model. We note here that both curves are obtained with a single model, which highlights the transferability of the model across applied potentials.
In conclusion, for typical electrochemical interfaces, the polarization induced by an applied electric field is roughly two orders of magnitude smaller than the thermal dipole fluctuations of the liquid. As a result, machine‑learning potentials that are trained only on locally partitioned charges or dipoles tend to fit the noisy local variations and effectively discard the much weaker macroscopic component, which leads to unacceptably large errors in the long‑range electrostatic field. Our macro‑dipole‑constrained SMILE-CP approach restores the correct field while preserving the accuracy of the local charge distribution. The only additional information required is the scalar total dipole, which is readily obtainable from standard AIMD runs. Thus, the method avoids any ambiguous charge‑partitioning step. Because the constraint is introduced as a loss‑function term, it works with any descriptor and any neural‑network architecture. Benchmarks on nanoconfined water, Mg\(^{2+}\) dissolution, and a biased Mg vicinal surface show that the macro‑dipole constraint eliminates the qualitative errors in the electrostatic potential observed for unconstrained MLIPs and reproduces the full Poisson solution from the scalar charges alone. This flexible, easy‑to‑implement, and data‑efficient strategy therefore opens the door to bias‑controlled, nanosecond‑scale simulations of realistic electrochemical interfaces and paves the way for systematic investigations of voltage‑dependent processes in batteries, fuel cells, and electrocatalysis.
We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through SFB1394, project no. 409476157 and SFB1625, project no. 506711657. J. Y. acknowledge support by the Alexander von Humboldt foundation.