September 25, 2025
Alessandro Bombini1,2\(\dagger\), Alessandro Rosa1, Clarissa Buti1, Giovanni Passaleva1 and Lucio Anderlini1
1 Istituto Nazionale di Fisica Nucleare, Sezione di Firenze, Via B. Rossi 1, 50019 Sesto Fiorentino (FI), Italy
2 ICSC - Centro Nazionale di Ricerca in High Performance Computing, Big Data & Quantum Computing, Via Magnanelli 2, 40033, Casalecchio di Reno (BO), Italy
\(\dagger\) bombini @ fi.infn.it
Figure 1: |
Figure 2: |
Designing detectors for next-generation particle physics experiments demands reliable performance under extreme particle fluxes, driven by the pursuit of high instantaneous luminosity. This challenge is central to upgrades for High-Luminosity LHC [1]–[4], LHCb [5], [6], and to proposed detectors for FCC-hh [7] and the Muon Collider [8].
Diamond detectors are promising candidates due to their superior radiation hardness and water-equivalence, making them ideal for dosimetry. Moreover, laser-induced graphitisation enables embedding conductive regions within the diamond bulk, allowing flexible electrode geometries [9]–[12]. In 3 (a) we report the schematic representation of the geometry of 3D diamond sensors, in which the graphite electrodes are orthogonal to the diamond surface. In 3, we present an actual picture of such diamond detector.
Simulating these devices is challenging: charge transport in the semiconductor and signal propagation through resistive electrodes contribute comparably to time resolution. A time-dependent extension of the Ramo-Shockley theorem captures these effects via dynamic weighting potentials, governed by Maxwell equations in quasi-static approximation [13]–[15].
By using the extended form of the Ramo-Shockley theorem for conductive media, the contribution of the resistive material to the signal formation is included in the time evolution of the weighting potential, which is the solution of the Maxwell equations in the quasi-static limit [16]: \[\begin{align} \epsilon \nabla^2 V(t, \mathbf{x}) &= - \rho (t, \mathbf{x}) \\ \partial_t \rho (t, \mathbf{x}) &= \nabla \cdot \left[ \sigma(\mathbf{x}) \nabla V(t, \mathbf{x}) \right] \,, \end{align}\] where \(V(t, \mathbf{x})\) is the weighting potential field, \(\epsilon = \epsilon_0 \epsilon_r\) is the dielectric constant in the material, which is the same for diamond and graphite, and thus constant on the whole geometry, \(\rho (t, \mathbf{x})\) is the charge distribution field, and \(\sigma(\mathbf{x})\) is the function defining the conductivity in the diamond/graphite geometry. These equations can be condensed in a single, third-order partial differential equation \[\label{eq46PDE} \epsilon \partial_t \nabla^2 V(t, \mathbf{x}) + \nabla \cdot \left[ \sigma(\mathbf{x}) \nabla V(t, \mathbf{x}) \right] = 0\,.\tag{1}\]
The full system to describe the behaviour of the weighting field in the geometry is equipped with an initial condition (IC) \[V(t=0, \mathbf{x}) = \begin{cases} + V_0\,, &\mathbf{x} \in \partial C_+ \cap \partial \Omega \,,\\ 0\,, & \text{otherwise} \,, \end{cases}\] and boundary conditions (BC) \[\begin{align} V(t, \mathbf{x}) &= +V_0\,, \quad \mathbf{x} \in \partial C_+ \cap \partial \Omega\,,\\ V(t, \mathbf{x}) &= 0 \,, \,\,\qquad \mathbf{x} \in \partial C_- \cap \partial \Omega\,,\\ \hat{\mathbf{n}} \cdot \nabla V(t, \mathbf{x}) &= 0\,, \qquad \,\, \mathbf{x} \in \partial\Omega \setminus\bigcup_iC_{i} \,, \end{align}\] where \(\Omega\) is the whole diamond+graphite geometry, \(C_\pm\) are the conductive cylinders attached to potential (\(C_+\)) or grounded (\(C_-\)). The last condition applies to all the boundary region of \(\Omega\) where there is no intersection with conductive columns.
Notice that this equation accounts for the timely response of the detector, with respect to its design: the manufacturable aspect of the geometry is kept into account in the function \(\sigma(\mathbf{x})\), which is the result of the manufacturing process.
The computation needed to solve the differential equation describing the system to obtain the time-dependent electric field, via the (reformulated) Ramo-Shockley theorem, are highly non-trivial. Using standard numerical solver, like COMSOL finite element method [17] on a custom mesh [14], or spectral methods [16], gives mesh-ful results, on a grid-spaced time steps.
This motivates the role that can be played by Physics Informed Neural Networks (PINNs) [18], [19], either as a surrogate model as well as a mixed numerical solver. In fact, we may leverage the power of neural networks to extrapolate statistical relations among data, and, by adding physical information about the system through the loss as prescribed by PINN, create a parametric surrogate model to optimize the design of Diamond detectors.
Figure 3: A microscopic image of a 3D diamond sensor. The specimen was tilted during acquisition to show the graphitised electrodes. From [16]. a — Diagrammatic illustration of a three-dimensional diamond sensor, depicting a segment comprising four by four fundamental units. The figure show in red electrodes connected to the polarization voltage, and in black the ones grounded.From [16]
Physics-Informed Neural Networks are neural networks (NNs) whose training is based on the mathematical model that governs the physical phenomena that we are studying, and, possibly, on available data. More precisely, in our case, rewriting the PDE 1 with BC and IC in a more compact form as1 \[\left\{ \begin{array}{ll} \mathcal{F}(V,\partial_\bullet V, \partial_\bullet^2 V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{\mathbb{R} \times \Omega} \\ \mathcal{B}(V,\partial_\bullet V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{ \mathbb{R} \times \partial\Omega},\\ \mathcal{I}(V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{\{0\} \times \overline{\Omega}}, \end{array}\right.\] we optimize the parameters of the NN by minimizing the following loss function: \[\begin{align} \mathcal{L}_{\text{tot}}&=\lambda_{\text{data}} \mathcal{L}_{\text{data}}\big(V_\theta(t,\mathbf{x}),V_{\text{data}}\big)+\lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}}\big(V_\theta(t,\mathbf{x})\big) + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}}\big(V_\theta(t,\mathbf{x})\big)+\lambda_{\text{IC}} \mathcal{L}_{\text{IC}}\big(V_\theta(t,\mathbf{x})\big), \end{align}\] where \[\begin{align} \mathcal{L}_{\text{data}}&=\frac{1}{N_d}\,\sum_{i=1}^{N_d}\left|V_\theta(t^i_d,\mathbf{x}^i_d)-V_{\text{data}}^i\right|^2,\quad \mathcal{L}_{\text{PDE}}=\frac{1}{N_P}\,\sum_{i=1}^{N_P}\left|\mathcal{F}(V_\theta,\partial_\bullet V_\theta, \partial_\bullet^2 V_\theta)(t_P^i,\mathbf{x}^i_P)\right|^2,\\ \mathcal{L}_{\text{BC}}&=\frac{1}{N_B}\,\sum_{i=1}^{N_B}\left|\mathcal{B}(V_\theta,\partial_\bullet V_\theta)(t_B^i,\mathbf{x}^i_B)\right|^2,\quad \mathcal{L}_{\text{IC}}=\frac{1}{N_I}\,\sum_{i=1}^{N_I}\left|\mathcal{I}(V_\theta)(t_I^i,\mathbf{x}^i_I)\right|^2. \end{align}\] Here, the set \(\left\{t_d^i,\mathbf{x}^i_d,V_{\text{data}}^i\right\}_{i=1,\ldots,N_d}\) denotes the available data, and \(\left\{t^i_P,\mathbf{x}_P^i\right\}_{i=1,\ldots,N_P}\) \(\in \mathbb{R}\times \Omega\), \(\left\{t^i_B,\mathbf{x}_B^i\right\}_{i=1,\ldots,N_B}\) \(\in \mathbb{R}\times \partial \Omega\), \(\left\{t_I^i,\mathbf{x}_I^i\right\}_{i=1,\ldots,N_I} \in \{0\}\times \overline{\Omega}\) are random-generated points for PDE, BC, IC validation of NN, respectively. Moreover, \(\lambda_{\text{data}}, \lambda_{\text{PDE}}, \lambda_{\text{BC}}, \lambda_{\text{IC}}\) are the weights of the respective loss functions. Finally, \(\theta\) denotes the NN trainable weights.
To tackle this problem, we designed a Mixture-of-Experts (MoE) model [20], comprising three experts: two are Multi-Layer Perceptron (MLP) architectures,
with 6 layers and 256 nodes per layer, with skip connections and adaptive activations [21], one uses sigmoid-weighted linear unit
(SiLU
) [22] as activation function, while the other uses the self-scalable tanh (STAN
) [23]; the third architecture is a Multi-scale Fourier Network [24], again with 6 layers and 256 nodes per layer, skip connections, adaptive activations and SiLU
activation function. The gate network is a simply 2 layer, 128 node per layer network again with 6 layers and
256 nodes per layer, skip connections, adaptive activations and SiLU
activation function. The role of the gate network is to produce, for each input data \((t_i, \mathbf{x}_i)\) an importance vector
\(\mathbf{g} : g_i, i=1,\ldots,N_{\text{experts}}\) to weight the contribution of each expert \(u_i\), so that the whole solution is \[V_\theta (t_i, \mathbf{x}_i)
= \sum_{i=1}^{N_{\text{experts}}} g_i (t_i, \mathbf{x}_i) \, u_i (t_i, \mathbf{x}_i) \,.\]
To draw the PDE, BC and IC data we have used a quasi-random sampling using Halton sequences [25]. Furthermore, we have used an importance measure [26] for the field, where the importance function \(i(t, \mathbf{x})\) was \[i(t, \mathbf{x}) = \sqrt{ \left(\nabla V_\theta (t, \mathbf{x})\right)^2 + 5 \left(\partial_t V_\theta (t, \mathbf{x})\right)^2 } + \left| V_\theta (t, \mathbf{x})\right| + 10 \,,\] so that we can have a focus on region with high field values and/or fast changing values. Finally, we have used also a temporal loss weighting for the PDE, \[\lambda_{\text{PDE}} (t) = 1 + c_T \left( 1 - \frac{t}{T} \right) \,,\] to focus on the initial stages of the dynamics. Finally, to adaptively weight the multi-objective loss, we used the Neural Tangent Kernel approach [27].
The implementation of the code needed to perform this approach has been done using the Python language, the PyTorch-based open-source package Nvidia PhysicsNemo2 (formerly Modulus), and the development, debugging, training and test process has been conducted on the HPC cloud-based platform offered by AI_INFN [28].
The results obtained are reported in 4 and 5. In 4 we report the time evolution of \(V\) in a few time steps (not seen during training) at \(y=0 \;\mu m\). The first column represent the "true" field (i.e., the FEM simulation data); the second column represent the MoE-PINN prediction; the last column, the absolute error. From here we see that the MoE-PINN is capable of inferring the dynamics of the system.
|c>cc |c>ccc>ccc>cc | Relative Error / Time & Mean & Median & Mode (bin n\(^{\circ}\))
\(5.0\cdot 10^{-4}\) ns & \(8.4\,\%\) & \(7.5\,\%\) & \(6.4\,\% \;\;(16)\)
\(1.8\cdot 10^{-3}\) ns & \(4.2 \,\%\) & \(3.4 \,\%\) & \(1.2\,\% \;\;(3)\)
\(6.0\cdot 10^{-3}\) ns & \(4.9\,\%\) & \(4.8\,\%\) & \(3.6\,\% \;\;(9)\)
\(2.0\cdot 10^{-2}\) ns & \(4.4 \,\%\) & \(4.2 \,\%\) & \(3.6\,\% \;\;(9)\)
\(6.6\cdot 10^{-2}\) ns & \(5.4 \,\%\) & \(4.7\,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(2.2\cdot 10^{-1}\) ns & \(3.0 \,\%\) & \(2.3\,\%\) & \(0.1\% \;\;(2)\)
\(7.2\cdot 10^{-1}\) ns & \(2.9 \,\%\) & \(2.2 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(2.4\) ns & \(3.6 \,\%\) & \(2.9 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(7.0\) ns & \(6.0 \,\%\) & \(5.2 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
In 5 is reported the histogram plot of the Relative errors, separated by integration time step. We see that the relative error decrase in time, and that the overall mean error is below \(5\%\), the median error is around \(3\%\), while the mode error is \(0.00005\%\), i.e. the most counts overall are in the first bin (in a 200 bins histogram).
This work demonstrates the application of Physics-Informed Neural Networks (PINNs) to the design and optimization of diamond detectors. Specifically, we employed a Mixture-of-Experts PINN (MoE-PINN) to perform mesh-free interpolation of Maxwell’s equations, under quasi-static approximation, both spatially and temporally, capturing the detector’s response to charged particle passage via an extended Ramo-Shockley theorem for resistive media.
Despite the complexity of the governing third-order PDE, characterised by a third-order PDE with an underlying approximate spatial functional symmetry3 \(V(t, \mathbf{x}) \mapsto V(t, \mathbf{x}) + f(\mathbf{x})\) if \(\nabla^2 f(\mathbf{x}) = 0\), we successfully trained the MoE-PINN as a physics-informed, mesh-free surrogate for the numerical solver. The model achieved a median error of approximately 3%, with the most frequent error mode being lower at almost each timesteps.
This work is partly supported by ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU.
The work of AB and AR was funded by Progetto ICSC - Spoke 2 - Codice CN00000013 - CUP I53C21000340006 - Missione 4 Istruzione e ricerca - Componente 2 Dalla ricerca all’impresa – Investimento 1.4.