Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders


Alessandro Bombini1,2\(\dagger\), Alessandro Rosa1, Clarissa Buti1, Giovanni Passaleva1 and Lucio Anderlini1

1 Istituto Nazionale di Fisica Nucleare, Sezione di Firenze, Via B. Rossi 1, 50019 Sesto Fiorentino (FI), Italy
2 ICSC - Centro Nazionale di Ricerca in High Performance Computing, Big Data & Quantum Computing, Via Magnanelli 2, 40033, Casalecchio di Reno (BO), Italy
\(\dagger\) bombini @ fi.infn.it

Figure 1: image.

Figure 2: image.

↩︎

1 Introduction↩︎

Designing detectors for next-generation particle physics experiments demands reliable performance under extreme particle fluxes, driven by the pursuit of high instantaneous luminosity. This challenge is central to upgrades for High-Luminosity LHC [1][4], LHCb [5], [6], and to proposed detectors for FCC-hh [7] and the Muon Collider [8].

Diamond detectors are promising candidates due to their superior radiation hardness and water-equivalence, making them ideal for dosimetry. Moreover, laser-induced graphitisation enables embedding conductive regions within the diamond bulk, allowing flexible electrode geometries [9][12]. In 3 (a) we report the schematic representation of the geometry of 3D diamond sensors, in which the graphite electrodes are orthogonal to the diamond surface. In 3, we present an actual picture of such diamond detector.

Simulating these devices is challenging: charge transport in the semiconductor and signal propagation through resistive electrodes contribute comparably to time resolution. A time-dependent extension of the Ramo-Shockley theorem captures these effects via dynamic weighting potentials, governed by Maxwell equations in quasi-static approximation [13][15].

By using the extended form of the Ramo-Shockley theorem for conductive media, the contribution of the resistive material to the signal formation is included in the time evolution of the weighting potential, which is the solution of the Maxwell equations in the quasi-static limit [16]: \[\begin{align} \epsilon \nabla^2 V(t, \mathbf{x}) &= - \rho (t, \mathbf{x}) \\ \partial_t \rho (t, \mathbf{x}) &= \nabla \cdot \left[ \sigma(\mathbf{x}) \nabla V(t, \mathbf{x}) \right] \,, \end{align}\] where \(V(t, \mathbf{x})\) is the weighting potential field, \(\epsilon = \epsilon_0 \epsilon_r\) is the dielectric constant in the material, which is the same for diamond and graphite, and thus constant on the whole geometry, \(\rho (t, \mathbf{x})\) is the charge distribution field, and \(\sigma(\mathbf{x})\) is the function defining the conductivity in the diamond/graphite geometry. These equations can be condensed in a single, third-order partial differential equation \[\label{eq46PDE} \epsilon \partial_t \nabla^2 V(t, \mathbf{x}) + \nabla \cdot \left[ \sigma(\mathbf{x}) \nabla V(t, \mathbf{x}) \right] = 0\,.\tag{1}\]

The full system to describe the behaviour of the weighting field in the geometry is equipped with an initial condition (IC) \[V(t=0, \mathbf{x}) = \begin{cases} + V_0\,, &\mathbf{x} \in \partial C_+ \cap \partial \Omega \,,\\ 0\,, & \text{otherwise} \,, \end{cases}\] and boundary conditions (BC) \[\begin{align} V(t, \mathbf{x}) &= +V_0\,, \quad \mathbf{x} \in \partial C_+ \cap \partial \Omega\,,\\ V(t, \mathbf{x}) &= 0 \,, \,\,\qquad \mathbf{x} \in \partial C_- \cap \partial \Omega\,,\\ \hat{\mathbf{n}} \cdot \nabla V(t, \mathbf{x}) &= 0\,, \qquad \,\, \mathbf{x} \in \partial\Omega \setminus\bigcup_iC_{i} \,, \end{align}\] where \(\Omega\) is the whole diamond+graphite geometry, \(C_\pm\) are the conductive cylinders attached to potential (\(C_+\)) or grounded (\(C_-\)). The last condition applies to all the boundary region of \(\Omega\) where there is no intersection with conductive columns.

Notice that this equation accounts for the timely response of the detector, with respect to its design: the manufacturable aspect of the geometry is kept into account in the function \(\sigma(\mathbf{x})\), which is the result of the manufacturing process.

The computation needed to solve the differential equation describing the system to obtain the time-dependent electric field, via the (reformulated) Ramo-Shockley theorem, are highly non-trivial. Using standard numerical solver, like COMSOL finite element method [17] on a custom mesh [14], or spectral methods [16], gives mesh-ful results, on a grid-spaced time steps.

This motivates the role that can be played by Physics Informed Neural Networks (PINNs) [18], [19], either as a surrogate model as well as a mixed numerical solver. In fact, we may leverage the power of neural networks to extrapolate statistical relations among data, and, by adding physical information about the system through the loss as prescribed by PINN, create a parametric surrogate model to optimize the design of Diamond detectors.

a

b

Figure 3: A microscopic image of a 3D diamond sensor. The specimen was tilted during acquisition to show the graphitised electrodes. From [16]. a — Diagrammatic illustration of a three-dimensional diamond sensor, depicting a segment comprising four by four fundamental units. The figure show in red electrodes connected to the polarization voltage, and in black the ones grounded.From [16]

2 Methods: using physics informed neural networks to solve the governing PDE in a meshless way↩︎

Physics-Informed Neural Networks are neural networks (NNs) whose training is based on the mathematical model that governs the physical phenomena that we are studying, and, possibly, on available data. More precisely, in our case, rewriting the PDE 1 with BC and IC in a more compact form as1 \[\left\{ \begin{array}{ll} \mathcal{F}(V,\partial_\bullet V, \partial_\bullet^2 V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{\mathbb{R} \times \Omega} \\ \mathcal{B}(V,\partial_\bullet V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{ \mathbb{R} \times \partial\Omega},\\ \mathcal{I}(V; t, \mathbf{x})=0,&\quad (t,\mathbf{x})\in \text{\{0\} \times \overline{\Omega}}, \end{array}\right.\] we optimize the parameters of the NN by minimizing the following loss function: \[\begin{align} \mathcal{L}_{\text{tot}}&=\lambda_{\text{data}} \mathcal{L}_{\text{data}}\big(V_\theta(t,\mathbf{x}),V_{\text{data}}\big)+\lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}}\big(V_\theta(t,\mathbf{x})\big) + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}}\big(V_\theta(t,\mathbf{x})\big)+\lambda_{\text{IC}} \mathcal{L}_{\text{IC}}\big(V_\theta(t,\mathbf{x})\big), \end{align}\] where \[\begin{align} \mathcal{L}_{\text{data}}&=\frac{1}{N_d}\,\sum_{i=1}^{N_d}\left|V_\theta(t^i_d,\mathbf{x}^i_d)-V_{\text{data}}^i\right|^2,\quad \mathcal{L}_{\text{PDE}}=\frac{1}{N_P}\,\sum_{i=1}^{N_P}\left|\mathcal{F}(V_\theta,\partial_\bullet V_\theta, \partial_\bullet^2 V_\theta)(t_P^i,\mathbf{x}^i_P)\right|^2,\\ \mathcal{L}_{\text{BC}}&=\frac{1}{N_B}\,\sum_{i=1}^{N_B}\left|\mathcal{B}(V_\theta,\partial_\bullet V_\theta)(t_B^i,\mathbf{x}^i_B)\right|^2,\quad \mathcal{L}_{\text{IC}}=\frac{1}{N_I}\,\sum_{i=1}^{N_I}\left|\mathcal{I}(V_\theta)(t_I^i,\mathbf{x}^i_I)\right|^2. \end{align}\] Here, the set \(\left\{t_d^i,\mathbf{x}^i_d,V_{\text{data}}^i\right\}_{i=1,\ldots,N_d}\) denotes the available data, and \(\left\{t^i_P,\mathbf{x}_P^i\right\}_{i=1,\ldots,N_P}\) \(\in \mathbb{R}\times \Omega\), \(\left\{t^i_B,\mathbf{x}_B^i\right\}_{i=1,\ldots,N_B}\) \(\in \mathbb{R}\times \partial \Omega\), \(\left\{t_I^i,\mathbf{x}_I^i\right\}_{i=1,\ldots,N_I} \in \{0\}\times \overline{\Omega}\) are random-generated points for PDE, BC, IC validation of NN, respectively. Moreover, \(\lambda_{\text{data}}, \lambda_{\text{PDE}}, \lambda_{\text{BC}}, \lambda_{\text{IC}}\) are the weights of the respective loss functions. Finally, \(\theta\) denotes the NN trainable weights.

Figure 4: Results of the trained model. The first row represents the real data, i.e. the data obtained from the FEM simulation. The second row represents the MoE network prediction. The last row is the absolute error.

To tackle this problem, we designed a Mixture-of-Experts (MoE) model [20], comprising three experts: two are Multi-Layer Perceptron (MLP) architectures, with 6 layers and 256 nodes per layer, with skip connections and adaptive activations [21], one uses sigmoid-weighted linear unit (SiLU) [22] as activation function, while the other uses the self-scalable tanh (STAN) [23]; the third architecture is a Multi-scale Fourier Network [24], again with 6 layers and 256 nodes per layer, skip connections, adaptive activations and SiLU activation function. The gate network is a simply 2 layer, 128 node per layer network again with 6 layers and 256 nodes per layer, skip connections, adaptive activations and SiLU activation function. The role of the gate network is to produce, for each input data \((t_i, \mathbf{x}_i)\) an importance vector \(\mathbf{g} : g_i, i=1,\ldots,N_{\text{experts}}\) to weight the contribution of each expert \(u_i\), so that the whole solution is \[V_\theta (t_i, \mathbf{x}_i) = \sum_{i=1}^{N_{\text{experts}}} g_i (t_i, \mathbf{x}_i) \, u_i (t_i, \mathbf{x}_i) \,.\]

To draw the PDE, BC and IC data we have used a quasi-random sampling using Halton sequences [25]. Furthermore, we have used an importance measure [26] for the field, where the importance function \(i(t, \mathbf{x})\) was \[i(t, \mathbf{x}) = \sqrt{ \left(\nabla V_\theta (t, \mathbf{x})\right)^2 + 5 \left(\partial_t V_\theta (t, \mathbf{x})\right)^2 } + \left| V_\theta (t, \mathbf{x})\right| + 10 \,,\] so that we can have a focus on region with high field values and/or fast changing values. Finally, we have used also a temporal loss weighting for the PDE, \[\lambda_{\text{PDE}} (t) = 1 + c_T \left( 1 - \frac{t}{T} \right) \,,\] to focus on the initial stages of the dynamics. Finally, to adaptively weight the multi-objective loss, we used the Neural Tangent Kernel approach [27].

Figure 5: Histogram reporting the Relative Errors, i.e. the L1 distance over the true value of the weighting field, between the MoE-PINN prediction and the data points. In the inlet, it is reported the zoom of the histogram plot with error below 20%.

3 Results↩︎

The implementation of the code needed to perform this approach has been done using the Python language, the PyTorch-based open-source package Nvidia PhysicsNemo2 (formerly Modulus), and the development, debugging, training and test process has been conducted on the HPC cloud-based platform offered by AI_INFN [28].

The results obtained are reported in 4 and 5. In 4 we report the time evolution of \(V\) in a few time steps (not seen during training) at \(y=0 \;\mu m\). The first column represent the "true" field (i.e., the FEM simulation data); the second column represent the MoE-PINN prediction; the last column, the absolute error. From here we see that the MoE-PINN is capable of inferring the dynamics of the system.

|c>cc |c>ccc>ccc>cc | Relative Error / Time & Mean & Median & Mode (bin n\(^{\circ}\))
\(5.0\cdot 10^{-4}\) ns & \(8.4\,\%\) & \(7.5\,\%\) & \(6.4\,\% \;\;(16)\)
\(1.8\cdot 10^{-3}\) ns & \(4.2 \,\%\) & \(3.4 \,\%\) & \(1.2\,\% \;\;(3)\)
\(6.0\cdot 10^{-3}\) ns & \(4.9\,\%\) & \(4.8\,\%\) & \(3.6\,\% \;\;(9)\)
\(2.0\cdot 10^{-2}\) ns & \(4.4 \,\%\) & \(4.2 \,\%\) & \(3.6\,\% \;\;(9)\)
\(6.6\cdot 10^{-2}\) ns & \(5.4 \,\%\) & \(4.7\,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(2.2\cdot 10^{-1}\) ns & \(3.0 \,\%\) & \(2.3\,\%\) & \(0.1\% \;\;(2)\)
\(7.2\cdot 10^{-1}\) ns & \(2.9 \,\%\) & \(2.2 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(2.4\) ns & \(3.6 \,\%\) & \(2.9 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)
\(7.0\) ns & \(6.0 \,\%\) & \(5.2 \,\%\) & \(4.1\cdot 10^{-5}\,\% \;\;(0)\)

In 5 is reported the histogram plot of the Relative errors, separated by integration time step. We see that the relative error decrase in time, and that the overall mean error is below \(5\%\), the median error is around \(3\%\), while the mode error is \(0.00005\%\), i.e. the most counts overall are in the first bin (in a 200 bins histogram).

4 Conclusion↩︎

This work demonstrates the application of Physics-Informed Neural Networks (PINNs) to the design and optimization of diamond detectors. Specifically, we employed a Mixture-of-Experts PINN (MoE-PINN) to perform mesh-free interpolation of Maxwell’s equations, under quasi-static approximation, both spatially and temporally, capturing the detector’s response to charged particle passage via an extended Ramo-Shockley theorem for resistive media.

Despite the complexity of the governing third-order PDE, characterised by a third-order PDE with an underlying approximate spatial functional symmetry3 \(V(t, \mathbf{x}) \mapsto V(t, \mathbf{x}) + f(\mathbf{x})\) if \(\nabla^2 f(\mathbf{x}) = 0\), we successfully trained the MoE-PINN as a physics-informed, mesh-free surrogate for the numerical solver. The model achieved a median error of approximately 3%, with the most frequent error mode being lower at almost each timesteps.

Acknowledgements↩︎

4.0.0.1 Funding information

This work is partly supported by ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU.

The work of AB and AR was funded by Progetto ICSC - Spoke 2 - Codice CN00000013 - CUP I53C21000340006 - Missione 4 Istruzione e ricerca - Componente 2 Dalla ricerca all’impresa – Investimento 1.4.

References↩︎

[1]
C. Da Via et al., 3D silicon sensors: Design, large area production and quality assurance for the ATLAS IBL pixel detector upgrade, Nucl. Instrum. Meth. A 694, 321 (2012), .
[2]
M. Meschini et al., First Results on 3D Pixel Sensors Interconnected to the RD53A Readout Chip after Irradiation to \(1\times 10^{16}\)neq cm\(^{-2}\), JINST 14(06), C06018 (2019), , .
[3]
G.-F. Dalla Betta, M. Boscardin, G. Darbo, R. Mendicino, M. Meschini, A. Messineo, S. Ronchin, D. M. S. Sultan and N. Zorzi, Development of a new generation of 3D pixel sensors for HL-LHC, Nucl. Instrum. Meth. A 824, 386 (2016), , .
[4]
G.-F. Dalla Betta et al., The INFNFBK “Phase-2” R & D; program, Nucl. Instrum. Meth. A 824, 388 (2016), , .
[5]
L. LHCb collaboration, Framework TDR for the LHCb Upgrade II, Tech. rep., CERN, Geneva, (2021).
[6]
L. LHCb collaboration, LHCb Upgrade II Scoping Document, Tech. rep., CERN, Geneva, (2024).
[7]
A. Abada et al., FCC-hh: The Hadron Collider: Future Circular Collider Conceptual Design Report Volume 3, Eur. Phys. J. ST 228(4), 755 (2019), .
[8]
C. Accettura et al., Interim report for the International Muon Collider Collaboration (IMCC), CERN Yellow Rep. Monogr. 2/2024, 176 (2024), , .
[9]
F. Oliva, Operation and performance of the active target of PADME, Nucl. Instrum. Meth. A 958, 162354 (2020), .
[10]
A. Porter, K. Kanxheri, I. L. Paz, A. Oh, L. Servoli and C. Talamonti, A 3D diamond dosimeter with graphitic surface connections, Diamond and Related Materials 133, 109692 (2023), .
[11]
L. Anderlini, M. Bellini, C. Corsi, S. Lagomarsino, C. Lucarelli, G. Passaleva, S. Sciortino and M. Veltri, Fabrication and First Full Characterisation of Timing Properties of 3D Diamond Detectors, Instruments 5(4), 39 (2021), .
[12]
F. Bachmair et al., A 3D diamond detector for particle tracking, Nucl. Instrum. Meth. A 786, 97 (2015), , .
[13]
W. Riegler, Extended theorems for signal induction in particle detectors VCI 2004, Nucl. Instrum. Meth. A 535, 287 (2004), .
[14]
D. Janssens, Resistive electrodes and particle detectors: Modelling and measurements of novel detector structures, Ph.D. thesis, Vrije U., Brussels, Presented 26 Feb 2024 (2024).
[15]
D. Janssens et al., Induced signals in particle detectors with resistive elements: Numerically modeling novel structures (VCI 2022), Nucl. Instrum. Meth. A 1040, 167227 (2022), .
[16]
L. Anderlini, A. Bombini, C. Buti, D. Janssens, S. Lagomarsino, G. Passaleva and M. Veltri, Optimization of 3d diamond detectors with graphitized electrodes based on an innovative numerical simulation(2025), .
[17]
COMSOL AB, COMSOL Multiphysics® Reference Manual, Version 6.2, Stockholm, Sweden (2024).
[18]
M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations, arXiv preprint arXiv:1711.10561 (2017).
[19]
M. Raissi, P. Perdikaris and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378, 686 (2019).
[20]
R. Bischof and M. Kraus, Mixture-of-experts-ensemble meta-learning for physics-informed neural networks(2022).
[21]
A. D. Jagtap, K. Kawaguchi and G. E. Karniadakis, Adaptive activation functions accelerate convergence in deep and physics-informed neural networks, Journal of Computational Physics 404, 109136 (2020).
[22]
S. Elfwing, E. Uchibe and K. Doya, Sigmoid-weighted linear units for neural network function approximation in reinforcement learning, Neural networks 107, 3 (2018).
[23]
R. Gnanasambandam, B. Shen, J. Chung, X. Yue et al., Self-scalable tanh (stan): Faster convergence and better generalization in physics-informed neural networks, arXiv preprint arXiv:2204.12589 (2022).
[24]
S. Wang, H. Wang and P. Perdikaris, On the eigenvector bias of fourier feature networks: From regression to solving multi-scale pdes with physics-informed neural networks, Computer Methods in Applied Mechanics and Engineering 384, 113938 (2021), .
[25]
J. H. Halton, On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals, Numerische Mathematik 2(1), 84 (1960).
[26]
M. A. Nabian, R. J. Gladstone and H. Meidani, Efficient training of physics-informed neural networks via importance sampling, Computer-Aided Civil and Infrastructure Engineering 36(8), 962 (2021).
[27]
S. Wang, X. Yu and P. Perdikaris, When and why pinns fail to train: A neural tangent kernel perspective, Journal of Computational Physics 449, 110768 (2022).
[28]
R. Petrini, L. Anderlini, M. Barbetti, G. Bianchini, D. Ciangottini, S. Dal Pra, D. Michelotto and D. Spiga, Developing artificial intelligence in the cloud: The ai infn platform, Computer Science 26(SI) (2025).

  1. With the notation \(\partial_\bullet\) we denote any derivative with respect to either time or space.↩︎

  2. https://github.com/NVIDIA/physicsnemo-sym.↩︎

  3. Which is valid everywhere except where \(\nabla \sigma \cdot \nabla V \neq 0\), i.e., only on the conductive cylinders surface.↩︎