1 Introduction↩︎

Silicon tracking detectors are a cornerstone of modern particle physics experiments [1]–[7]. Those systems consist of finely segmented silicon sensors coupled with readout integrated circuits (ROICs). Over the past decades, silicon trackers have advanced to include millions to billions of individual sensor elements, enabling unprecedented spatial resolution in extreme environments. Their successful operation opens the door to new experimental insights into some of the most pressing questions in fundamental physics. However, a key challenge lies in efficiently utilizing the vast volumes of data they produce, often at rates that exceed the capabilities of current processing and transmission technologies.

Among the most extreme examples of the challenges in operating high-granularity silicon trackers are the CMS and ATLAS experiments at the Large Hadron Collider (LHC) [8]–[13]. These detectors currently feature trackers composed of hundreds of millions of silicon pixels and tens of millions of silicon strips. Upcoming upgrades for the High-Luminosity LHC (HL-LHC) will further increase this complexity, with trackers expected to contain billions of pixel sensors. The data volume produced by the pixel detectors alone, however, already exceeds the available bandwidth for off-detector transmission. To manage the data, the experiments employ a multi-level trigger system designed to select a small subset of interesting events. The first-level trigger systems reduce the data rate from the bunch crossing rate of 40 MHz to approximately 100 kHz by making decisions based on information from sub-detectors other than the tracker. In both CMS and ATLAS, the first-level trigger currently does not make use of pixel data. In the CMS HL-LHC upgrade, the outer silicon strip tracker will be read out at 40 MHz and used in Level-1 triggering. The inner pixel detectors in both experiments will remain unused at this level. This omission limits the information available to make trigger decisions, with a significant impact for events involving displaced vertices, such as those involving low momentum heavy-flavor particles.

We have proposed intelligent on-detector systems capable of analyzing tracker data in real time as one avenue for handling the large data rate from pixel detectors [14]. We developed an neural network (NN) to filter particles based their transverse momentum and demonstrated its feasibility through simulation of a ROIC implementation. In this work, we present the first physical demonstration of our approach. The prototype shown in this paper performs signal processing and machine learning (ML)–based data filtering within a 28nm TSMC ROIC. We characterize the analog circuitry of the ROIC and the performance of the on-chip NN utilizing local charge-injection studies. This demonstration represents significant progress toward deploying large-scale pixel detector arrays at high rates and in extreme radiation environments through the use of ML-based front-end electronics and readout systems. The remainder of this paper discusses the electronics architecture and presents test results for the ROIC’s analog and digital components.

2 SmartPix ASIC v1 overview↩︎

A prototype readout integrated circuit (ROIC) was implemented as a \(1 \times 1.6\) mm\(^2\) ASIC in a 28 nm TSMC CMOS process. The design comprises two \(32 \times 8\) arrays of \(25 \times 25\) μm\(^2\) pixels, referred to as superpixels. Each pixel integrates an analog front-end (AFE) for signal processing together with digital back-end logic for cluster classification. The overall architecture is illustrated in Fig. 1 and described in detail in [15]–[17].

Figure 1: Architecture of a single analog front-end pixel. In the prototype ROIC there are 256 such pixels arranged in a 8x32 grid. The output of each pixel is input to the digital logic surrounding the analog islands. We refer to the threshold on bits 0, 1, and 2 as V_{th0}, V_{th1}, and V_{th2}, respectively. — Figure 1: Architecture of a single analog front-end pixel. In the prototype ROIC there are 256 such pixels arranged in a 8x32 grid. The output of each pixel is input to the digital logic surrounding the analog islands. We refer to the threshold on bits 0, 1, and 2 as V\(_{th0}\), V\(_{th1}\), and V\(_{th2}\), respectively.

The AFE consists of a preamplifier stage that integrates the charge collected from the sensor, followed by a 2-bit flash ADC that digitizes the signal into three thermometrically encoded digital bits at the bunch crossing clock rate. The flash ADC thresholds are provided off-chip via the biases V\(_{th0}\), V\(_{th1}\), and V\(_{th2}\) to control the tripping point of bits 0, 1, and 2 respectively. The ADC operates in two successive phases: first, the auto-zero phase, and second, the sampling of the integrated charge. Two pixel design variants were implemented as shown in Figure 2: superpixel v1 (SP1) and superpixel v2 (SP2) employs a differential ADC architecture and a single-ended ADC structure, respectively. Simulation studies indicate that SP2 achieves superior pixel-level performance in terms of noise and power consumption in simulation. SP1, however, is expected to perform more robustly in large-scale arrays due to its improved common-mode and supply-rejection characteristics, which are particularly advantageous for detector systems comprising thousands of pixels.

Figure 2: ASIC-level layout illustrating the two superpixel variants (left) and a close-up view of a 2\times2 pixel block (right). The central analog island shares a common deep-nwell substrate isolated from the surrounding digital region, which contains the configuration logic and machine-learning circuitry. — Figure 2: ASIC-level layout illustrating the two superpixel variants (left) and a close-up view of a \(2\times2\) pixel block (right). The central analog island shares a common deep-nwell substrate isolated from the surrounding digital region, which contains the configuration logic and machine-learning circuitry.

During normal operation, the digital data from each pixel is latched and encoded into a 2-bit binary format. The outputs are digitally summed along rows while preserving column information, thereby reducing the 2-bit data from 256 pixels into 16 buses of 6-bit values as shown in Figure 3. This effectively projects the raw pixel data along the local \(y\)-axis. The resulting 16 buses are subsequently processed by the digital logic, which implements the classification model. The entire back-end digital chain, from encoding through classification, is realized using combinatorial logic, ensuring zero-latency operation.

In test mode, pixel data remains in thermometric code and can be latched and loaded into a scan chain register for sequential readout. An additional clock signal, BxCLK, is supplied to the ASIC to control the scan chain. This provides the flexibility to align the phases of all signals and clocks externally, enabling maximum off-chip control to ensure the proper prototype characterization.

The back-end digital logic includes a compact two-layer NN that classifies charge clusters based on the transverse momentum (\(p_{\mathrm{T}}\)) of the incident charged particles as shown in Figure 3. The model architecture, training procedure, and dataset generation are described in [14]. The NN was trained on charge profiles from simulated pion clusters with CMS-like kinematics [18], achieving over 90% classification accuracy. The model was quantized using the QKeras package to create a compact implementation suitable for on-chip deployment. Using the hls4ml framework [19]–[21] and Siemens Catapult HLS [22], the quantized model was converted into synthesizable Register-Transfer Level (RTL) and integrated with system-level digital logic. The implementation prioritizes full parallelism to minimize inference latency, enabling real-time decision-making at the pixel level.

Due to the limited ASIC area, bump bonding the prototype to a sensor die is not feasible. Instead, a dedicated test input, pixInTest, is provided to emulate charge generation within the ASIC. Aside from power, ground, and current biasing, all necessary digital inputs, outputs, and analog stimuli are supplied off-chip via a data acquisition (DAQ) system. A summary of the key signals and their definitions is provided in Table 1. Those signals are used extensively throughout the testing procedure.

Table 1: Summary of main signals and their functional roles in the ASIC and test setup.
Signal	Description
BxCLK_ANA	Event clock synchronized with the hit rate, used internally by the analog front-end to control the integration and auto-zero phases of the ADC.
pixInTest	Analog input node used to inject charge into all pixels simultaneously through configurable on-chip injection capacitors. It emulates the charge collected by the sensor in normal detector operation. The time of injection Tinj is relative to the event clock BxCLK_ANA and controlled precisely by firmware.
BxCLK	Internal secondary bunch-crossing clock used to capture scan-chain output data. The phase of BxCLK is defined relative to the event clock BxCLK_ANA and controlled precisely by firmware. The firmware parameter controlling this phase is called BxCLK_Delay.
VTH0, VTH1, VTH2	External bias voltages defining the thresholds for the three comparator bits of the 2-bit flash ADC.
i_bias	Source current bias provided to the ASIC for the analog front-end.
DnnOut[1:0]	Two-bit output of the on-chip NN: 00 = high-\(p_{T}\), 01 = low-\(p_{T}\) (negative charge), 10 = low-\(p_{T}\) (positive charge), 11 = invalid.

3 Test setup↩︎

This section describes the experimental setup used to characterize the SmartPix ASIC prototype. The objective of the test environment is to emulate realistic detector operation while providing full control and observability of the analog and digital domains. Figure 4 shows the setup, which leverages programmable firmware, a novel data acquisition system (DAQ) system, the ASIC printed circuit boards, and precision waveform instrumentation.

Section 3.1 details the DAQ system based on the open-source Caribou and Spacely frameworks, which interface the ASIC with a Xilinx ZCU102 SoC FPGA for configuration and readout. Section 3.2 then discusses the timing and calibration procedures required to align injection, sampling, and readout phases across the system, ensuring reproducible and noise-free measurements of the AFE response. While the ASIC is designed to operate at 40 MHz, initial measurements were performed at 10 MHz to simplify the test setup and avoid bandwidth constraints. All results presented in the following sections were obtained under these conditions. Full-rate characterization at 40 MHz is planned once our initial campaign is completed.

3.1 Data acquisition system↩︎

A DAQ system was developed to test the prototype ROIC using the open-source Caribou [23] and Spacely workflows [24]. A linux based PC runs the Spacely software, which drives testing protocols with python routines. The testing protocols are sent from the PC to a Xilinx ZCU102 System-on-Chip (SoC) FPGA. A Peary server [25] is running on the SoC to facilitate communication between the python routines and firmware running on the FPGA of the ZCU102. The firmware is a custom implementation of a finite-state machine that executes the testing protocols at high rates. The ZCU102 is connected with a Control and Readout board (CaR Board), a custom printed circuit board (PCB) designed by CERN/BNL to generate clean digital control signals from the FPGA and supply power, bias voltages, and analog inputs to the device under test (DUT) [23]. A 12 V power supply was used to power both the FPGA and the CaR Board. While the CaR Board can generate analog pulses, a 5 Gbps, 14-bit SDG7102A waveform generator was integrated into the setup to deliver higher-quality pulses when required by the test environment.

The DUT board is a PCB designed at Fermilab that the prototype ROIC is bonded to for bi-directional communication. The system offers greater flexibility at a cheaper cost in comparison to the traditional National Instrument (NI) systems we have utilized in the past. More details about the DAQ system and it’s viability for future testing efforts beyond our work here will be explored in future works.

Figure 4: Test stand for the SmartPixel ASIC. The device under test (DUT), a custom PCB with the bonded ROIC, connects via a SEARAY connector to the CaR Board, which interfaces with a Xilinx ZCU102 SoC FPGA running the Peary server and connected to a workstation executing the python test routines (not shown). External power supplies bias the DUT and CaR Board. An external pulse generator supplies high quality pulses.

3.2 Timing and calibrations↩︎

Reliable operation requires calibration of charge injection, sampling, and readout timing. This subsection describes: (i) charge injection and S-curve methodology, (ii) clock and phase alignment, and (iii) the resulting firmware timing window.

3.2.1 Charge injection and S-curve measurement↩︎

Each pixel includes a programmable injection capacitor network controlled by a 2-bit register, enabling different charge quantities across the matrix and allowing artificial cluster formation to mimic the effect of a real particle track. Charge is injected by applying a voltage step to the bottom plate of the injection capacitor. The step, denoted pixInTest in Fig. 1, is distributed to all pixels via a metal grid and is driven off chip by a pulse generator. The injected charge is \[Q_{\text{IN}} = \frac{V_{\text{pixInTest}} \, C_{\text{TOT}}}{q_{\mathrm{e}}} \quad [\text{electrons}],\] where \(V_{\text{pixInTest}}\in[0,\SI{0.6}{\volt}]\) and \(C_{\text{TOT}}\in[0,\SI{5.55}{\femto\farad}]\) using the capacitors labeled \(C_{0}\) in Fig. 1, yielding approximately \(0\)–\(20\text{k}\, \mathrm{e}^{-}\).

The AFE performance is assessed via repeated S-curve measurements. The S-curve (the AFE CDF) is obtained by sweeping injected charge from low to high values; at each charge, \(\gtrsim 10^{3}\) samples are acquired. A Gaussian fit extracts the mean \(\mu\) and standard deviation \(\sigma\): the distribution of \(\mu\) across pixels characterizes threshold mismatch, while \(\sigma\) per pixel estimates the Equivalent Noise Charge (ENC). As shown in Fig. 5, the measured ENC is \(\sim 55\,\mathrm{e}^{-}\) for a 0.5 ns fall-time pulse and \(\sim 90\,\mathrm{e}^{-}\) for 15 ns. The degradation at longer fall times arises because the AFE integrator’s discharge time constant is shorter than the pulse, so the injected charge is not fully integrated and the S-curve broadens.

These pulse injection measurements and the corresponding S-curve analysis serve as the primary diagnostic metric to verify proper timing alignment across the AFE and data acquisition chain. Unless otherwise specified, all subsequent measurements presented in this work employ the fastest available injection fall time of 0.5 ns, ensuring optimal charge integration and noise performance.

Figure 5: (a) Typical S-curves for the three bits of one pixel (0–5000 e⁻ in 20 e⁻ steps; 1365 samples/step). (b) ENC vs pulse fall time with an upward trend in apparent ENC..

3.2.2 Clock and phase alignment↩︎

The relative phases of the injection time Tinj, and the event clock BxCLK_ANA, and the capture clock BxCLK must satisfy Fig. 6. If Tinj is too early, charges near the rising edge of BxCLK_ANA may not integrate; if too late, charges near/after its falling edge also fail to integrate. BxCLK samples the scan-chain on its rising edge; for margin, this edge should occur slightly before the falling edge of BxCLK_ANA, maximizing the integration window.

Figure 6: Timing diagram of the DAQ signals. Proper phase alignment among Tinj, BxCLK, and BxCLK_ANA ensures correct charge injection, integration, and readout. When pixInTest pulses low, a data sample d0 is generated at the output of the in-pixel ADCs. This sample is captured by BxCLK and serialized through the scanOut pad, transmitting the 768 thermometric bits of the ASIC (3 bits across 256 pixels). In parallel, d0 is projected along the y-axis and compressed into 16 rows of 6 bits each (y-d0), which are then processed by the on-chip ML network to produce a three-class output (DnnOut[1:0]). — Figure 6: Timing diagram of the DAQ signals. Proper phase alignment among `Tinj`, `BxCLK`, and `BxCLK_ANA` ensures correct charge injection, integration, and readout. When `pixInTest` pulses low, a data sample `d0` is generated at the output of the in-pixel ADCs. This sample is captured by `BxCLK` and serialized through the `scanOut` pad, transmitting the 768 thermometric bits of the ASIC (3 bits across 256 pixels). In parallel, `d0` is projected along the y-axis and compressed into 16 rows of 6 bits each (`y-d0`), which are then processed by the on-chip ML network to produce a three-class output (`DnnOut[1:0]`).

Figure 7 (a) shows the valid operating region of \([55,\,90]~\text{ns}\), where the injection delay corresponds to Tinj. If readout triggers too early, data may not be settled; if after the rising edge of BxCLK_ANA, it is lost as the ADC enters auto-zero. The firmware provides 2.5 ns delay resolution between BxCLK and BxCLK_ANA. Figure 7 (b) shows the standard deviation across this range, revealing a valid operating window of \([30,\,50]~\text{ns}\) set by pixel-matrix latency. Although the BxCLK and BxCLK_ANA clock trees were matched (skew \(\leq\SI{40}{\pico\second}\)), the second-to-last viable delay is chosen to provide timing margin under varying test conditions.

Figure 7: S-curve standard deviations for pixel 0 as a function of timing parameters. (a) ENC vs injection arrival time (`Tinj`). (b) ENC vs `BxCLK` sampling delay..

3.2.3 Firmware timing window↩︎

Other firmware settings, such as test delay and test sample, must also be tuned. The former defines a timing reference for initiating the firmware state machine; the latter compensates for the FPGA–ASIC–FPGA loop delay. Figure 8 summarizes the valid operating window; the region shifts vertically with cable length.

Figure 8: Valid region for firmware settings test delay and test sample. The triangular working region shifts with FPGA–ASIC cable length. — Figure 8: Valid region for firmware settings `test delay` and `test sample`. The triangular working region shifts with FPGA–ASIC cable length.

4 Analog front-end characterization↩︎

This section presents the characterization of the AFE. A simplified version of Figure 1 is shown in Figure [fig:AFEoverview] to illustrate the analog amplification chain and to depict how the input charge is generated, integrated, amplified, and digitized. The objectives of this study are threefold:

Characterize the functionality and performance of each analog front-end block individually (i.e., the preamplifier, comparator, and ADC) and collectively as part of the complete AFE chain, by reporting key performance metrics such as linearity, threshold dispersion, noise, and power consumption.
Validate the design and architecture, identify potential bugs and reliability issues, and ensure functional integrity of the circuit.
Demonstrate reliable pulsing of charge profiles into the pixel matrix, thereby confirming that the test stand can be effectively used to evaluate the downstream digital logic of the on-chip filtering neural network.

To carry out these investigations, charge is injected into the pixels, and the S-curve response is measured under various test conditions.

Figure 9: Analog front-end architecture of the prototype SmartPixel ROIC. The top diagram illustrates the preamplifier, which integrates the input charge Q_{\mathrm{in}} from the sensor and converts it to a voltage V_{\mathrm{PREAM}} through the linear charge-to-voltage conversion gain (\mathrm{CvG}). The lower panels depict the comparator operation: on the left, the auto-zero (AZ) phase when BxCLK_ANA = 0, and on the right, the sampling phase when BxCLK_ANA = 1. During the auto-zero phase, offset compensation is performed; during the sampling phase, the integrated charge is digitized. A small residual charge error, Q_{\text{error}}, may accumulate on the feedback capacitor C_{P}. — Figure 9: Analog front-end architecture of the prototype SmartPixel ROIC. The top diagram illustrates the preamplifier, which integrates the input charge \(Q_{\mathrm{in}}\) from the sensor and converts it to a voltage \(V_{\mathrm{PREAM}}\) through the linear charge-to-voltage conversion gain (\(\mathrm{CvG}\)). The lower panels depict the comparator operation: on the left, the auto-zero (AZ) phase when *BxCLK_ANA = 0*, and on the right, the sampling phase when *BxCLK_ANA = 1*. During the auto-zero phase, offset compensation is performed; during the sampling phase, the integrated charge is digitized. A small residual charge error, \(Q_{\text{error}}\), may accumulate on the feedback capacitor \(C_{P}\).

4.1 Analog Power↩︎

The current bias is supplied off chip to the ASIC and routed through a current mirror within the matrix bias block, which distributes the mirrored current to the pixels. Based on simulations, the AFE is expected to operate optimally with a bias between 3 and 5.5 \(\mu\)A/pixel. The plots in Figure 10 show the impact of the pixel bias current on the ENC for the three bits of pixel 0 in the SP1 architecture. The key observations are as follows. First, for bias currents below 2 \(\mu\)A/pixel, the pixel is not properly biased. In this regime, the preamplifier open-loop gain is likely non-nominal, and thermal noise from the common-source transistor becomes the dominant contribution. Second, the optimal operating range lies between 3 and 5.5 \(\mu\)A/pixel, consistent with simulation predictions. The best performance is observed at 3.6 \(\mu\)A/pixel, yielding an ENC of about 52 e\(^{-}\). Third, for bias currents above this range, the ENC increases progressively. This behavior arises because the preamplifier open-loop gain becomes non-optimal again, while the comparators also deviate from their preferred bias point. At 22 \(\mu\)A/pixel, the ENC reaches approximately 70 e\(^{-}\).

Figure 10: Effect of pixel bias current on the ENC of the three bits of pixel 0 in SP1. The ENC is minimized around 3.6 \muA/pixel (\sim 52 e^{-}). — Figure 10: Effect of pixel bias current on the ENC of the three bits of pixel 0 in SP1. The ENC is minimized around 3.6 \(\mu\)A/pixel (\(\sim 52 e^{-}\)).

4.2 Conversion Gain↩︎

The conversion gain (CvG) quantifies the increase in the preamplifier output voltage per electron of input charge. In the ideal case, CvG is equal to the inverse of the feedback capacitance \(C_{\mathrm{fb}}\), as illustrated in Figure [fig:AFEoverview]. In practice, however, this relationship is affected by additional parasitic capacitances originating from the metal interconnects and the CMOS devices within the preamplifier. Moreover, CvG is further reduced by the discharge path and leakage compensation circuitry, which inevitably divert part of the signal charge due to their finite impedance.

From Figure [fig:AFEoverview], it can be seen that the comparator output switches to 1 when its positive input equals the negative input. This condition can be expressed as

\[\begin{align} \mathrm{CvG} \times Q_{\mathrm{in}} &\geq V_{\mathrm{th}}, \\ \mathrm{CvG} \times \mathrm{pixInTest} \times C_{0} &\geq V_{\mathrm{th}} \label{eq:CvG} \end{align}\tag{1}\]

This equality holds only at the mean of the S-curve, where the comparator output has a 50% probability of being either 0 or 1. Thus, the conversion gain can be determined by extracting the mean of the S-curve across a wide range of threshold voltages \(V_{\mathrm{th}}\). The resulting measurements for SP1 and SP2 are shown in Figures 11 (a) and 11 (b), respectively.

The slope of the linear region in these plots corresponds directly to the pixel conversion gain (CvG). The CvG was extracted for all bits and pixels in both SP1 and SP2 architectures. The resulting histograms in Figure 12 show the distribution of CvG across the matrices, highlighting the impact of process variability, particularly in the injection capacitance \(C_{0}\) and the feedback capacitance \(C_{\mathrm{fb}}\) of the CSA stage. On average, the CvG is approximately \(58.5~\mu\mathrm{V}/\mathrm{e}^{-}\), with a standard deviation of \(2.06~\mu\mathrm{V}/\mathrm{e}^{-}\) for SP1 and \(4.76~\mu\mathrm{V}/\mathrm{e}^{-}\) for SP2. The CvG values were extracted in the threshold region \([600,\,1000]~\mathrm{e}^{-}\), confirming that SP2 exhibits larger non-linearities and mismatch. These results align with expectations from design simulations and corroborate the linearity limitations discussed in Section 4.3, as well as the systematic effects detailed in Section 4.4. These results corroborate the effects observed experimentally, providing additional evidence for the linearity limitations discussed in Section 4.3 and the systematic variations detailed in Section 4.4.

Figure 12: Comparison of the conversion gain dispersion across all three bits for the 256 pixels in the first and second superpixel variants.

4.3 Linearity↩︎

We observe in Figures 11 (a) and 11 (b) that three distinct regions appear at low, mid, and high threshold voltages. At low thresholds, the response exhibits significant non-linearity, arising from intrinsic limitations of the ADC. This behavior is well understood and is primarily attributed to the comparator operating in its two phases, auto-zero and sampling, as illustrated in the bottom plane of Figures [fig:AFEoverview]. During the sampling phase, the threshold voltage is connected to the comparator through a switch. When the BxCLK_ANA clock rises, this switch closes and injects a small charge that is integrated on the capacitance \(C_{P}\). This charge injection introduces an offset of approximately 3 mV on the threshold voltage, which produces a pronounced non-linearity at low thresholds but not at high thresholds. For input charges ranging from approximately \(500\,\mathrm{e}^-\) to \(8{,}000\,\mathrm{e}^-\) for SP1, and \(800\,\mathrm{e}^-\) to \(8{,}000\,\mathrm{e}^-\) for SP2, the response is highly linear since the switch error is negligible. The slope in this region corresponds to the system’s conversion gain. For charge inputs above \(8{,}000\,\mathrm{e}^-\), deviations from linearity are expected due to saturation effects in the preamplifier stage. This behavior aligns well with our simulation results.

4.4 Threshold Dispersion and Equivalent Noise Charge↩︎

The threshold dispersion \(Q_{\mathrm{th}}\) quantifies the non-uniformity of the detection threshold across the pixel matrix. This variation arises primarily from random device mismatch in the comparator (\(V_{\mathrm{os}}\)) and systematic threshold distribution differences (\(\Delta V_{\mathrm{th}}\)) among pixels. Additional sources include layout-induced variations and bias-grid nonuniformities.

4.4.1 Conceptual Model↩︎

Starting from the comparator input condition, the relation between injected charge and threshold voltage can be expressed as: \[\mathrm{CvG} \times Q_{\mathrm{in}} = V_{\mathrm{th}} + V_{\mathrm{os}} + \Delta V_{\mathrm{th}}, \label{eq:VTH95equation}\tag{2}\] where \(\mathrm{CvG}\) is the conversion gain, \(Q_{\mathrm{in}}\) is the injected charge, \(V_{\mathrm{th}}\) is the nominal comparator threshold, \(V_{\mathrm{os}}\) is the random comparator offset from process variation, and \(\Delta V_{\mathrm{th}}\) is the systematic offset across pixels. Defining the combined offset as: \[Z \equiv V_{\mathrm{os}} + \Delta V_{\mathrm{th}}, \label{eq:Z95definition}\tag{3}\] the corresponding probability density function follows the convolution: \[p_Z = p_{V_{\mathrm{os}}} * p_{\Delta V_{\mathrm{th}}}. \label{eq:convolution}\tag{4}\]

If \(V_{\mathrm{os}} \sim \mathcal{N}(\mu_{\mathrm{os}}, \sigma_{\mathrm{os}}^2)\) and \(\Delta V_{\mathrm{th}}\) has mean \(\mu_\Delta\) and variance \(\sigma_\Delta^2\), then: \[\mathbb{E}[Z] = \mu_{\mathrm{os}} + \mu_\Delta, \qquad \mathrm{Var}(Z) = \sigma_{\mathrm{os}}^2 + \sigma_\Delta^2, \label{eq:Z95moments}\tag{5}\] and the standard deviation of \(Z\) is: \[\sigma_Z = \sqrt{\sigma_{\mathrm{os}}^2 + \sigma_\Delta^2}. \label{eq:Z95std}\tag{6}\]

The resulting threshold dispersion is therefore: \[\sigma_{Q_{\mathrm{th}}} = \frac{\sigma_Z}{\mathrm{CvG}} = \frac{\sqrt{\sigma_{\mathrm{os}}^2 + \sigma_\Delta^2}}{\mathrm{CvG}}. \label{eq:QTH95equation}\tag{7}\]

In practice, the systematic contribution \(\sigma_\Delta^2\) is usually small, and the dominant term arises from \(\sigma_{\mathrm{os}}^2\), which is dynamically reduced by the auto-zero (AZ) compensation in the comparator. In measurement campaigns, additional fine-tuning circuitry is used to align individual pixel thresholds to the global reference if residual dispersion is observed.

4.4.2 Experimental Extraction from S-Curves↩︎

To determine the threshold dispersion, the input charge \(Q_{\mathrm{in}}\) satisfying Eq. 2 is obtained for each pixel and bit from the mean of its S-curve, corresponding to the 50% transition point where the comparator output has equal probability of being 0 or 1.

For each pixel, a Gaussian fit is applied to extract both the mean (\(\mu\)) and standard deviation (\(\sigma\)) of the transition region. The distribution of means across the matrix represents the threshold dispersion, while the distribution of standard deviations corresponds to the Equivalent Noise Charge (ENC).

Figure 13 summarizes the extraction procedure. The left column shows representative S-curves, the middle column shows the ENC dispersion obtained from the fitted standard deviations, and the right column shows the corresponding threshold dispersion extracted from the spread of S-curve means. The top row shows the combined S-curves obtained when all pixels in the matrix were pulsed simultaneously, whereas the bottom row shows the appended S-curves when each pixel was pulsed individually. The top-row results are heavily distorted and not representative of the true pixel response, as the global charge-injection grid introduces severe pulse-loading artifacts across the matrix. The bottom-row results are more representative of the actual pixel behavior but remain slightly pessimistic, as some residual loading effects from the global injection path persist. This measurement artifact and its origin are discussed in detail in Section 4.4.3.

4.4.3 Observed Effects and Design Issues↩︎

Two design-related effects were identified during these measurements: (a) global injection site error and (b) threshold bias leakage in SP2.

4.4.3.1 (a) Global Injection Site Effects.

In this prototype, charge injection is achieved through a single global pulse line shared across the entire pixel matrix. This line drives 256 pixels per superpixel, each with an estimated input capacitance of 1–5 fF, resulting in significant capacitive loading and RC distortion. Pixels located near the injection pads receive a cleaner and faster pulse, while those farther away experience slower rise/fall times.

The dependence of ENC on pulse fall time is shown in Fig. 5 (b). To mitigate this effect, the test procedure was optimized by activating only one pixel at a time and setting its calibration DAC to the minimum value, thereby reducing the total load from 1.25 pF (all pixels) to 1 fF (single pixel). This approach reduced the measured threshold dispersion from \(\sigma_{Q_{\mathrm{th}}} = 161~e^{-}\) to \(77~e^{-}\), and the ENC from \(103~e^{-}\) to \(55~e^{-}\), as shown in Fig. 13.

While this technique improves accuracy, it does not eliminate the fundamental limitation of the shared global injection site. Future ASIC revisions will implement local in-pixel charge injection to ensure uniform pulse profiles across the array.

4.4.3.2 (b) Threshold Input Leakage in SP2.

A systematic dispersion pattern was observed across all tested chips, where clusters of pixels consistently exhibited abnormally high threshold spread in the same physical regions. This indicates a systematic source rather than random process variation.

Figure 14: Measured leakage currents on the three threshold bias lines of the ROIC at room temperature. The currents range from –0.1 µA to 3 µA. These leakage currents originate in SP2 and produce IR drops within the shared bias grid, which also affect SP1 due to the common bias distribution.

Electrical measurements identified large leakage currents (–100 nA to 3 µA) on all three threshold bias lines of SP2, as shown in Fig. 14. These currents cause IR drops in the shared bias grid, affecting both SP2 and neighboring SP1 pixels due to their common bias infrastructure. Simulations confirmed this issue, revealing a leakage of approximately 100 nA per pixel in SP2—three orders of magnitude higher than in SP1. The resulting \(V_{\mathrm{TH}}\) leakage induces systematic mismatch \(\Delta V_{\mathrm{th}}\), varying from the nominal 31 mV (\(\approx\) 500 e\(^{-}\)) down to 16 mV (\(\approx\) 150 e\(^{-}\)) in extreme cases. This biases affected pixels into a nonlinear regime and broadens their threshold distributions.

From Eq. 4 , if \(\sigma_\Delta^2\) is large, the convolution of a uniform distribution (from \(\Delta V_{\mathrm{th}}\)) with a Gaussian (\(V_{\mathrm{os}}\)) produces flattened peaks and heavy shoulders, which we observe experimentally as skewness, hot spots, and multimodal distributions (Fig. 15). At lower temperatures \(-19^\circ\mathrm{C}\), the leakage current decreases significantly, reducing \(\Delta V_{\mathrm{th}}\) and improving the dispersion from 80 e\(^{-}\) to 50 e\(^{-}\)—consistent with simulations and expectations, since \(V_{\mathrm{os}}\) should be temperature independent. However, residual multimodal features remain, particularly for Bits 1 and 2, confirming intrinsic design limitations in SP2 that preclude scaling to larger arrays.

4.5 Summary and future analog design choices↩︎

We characterized two pixel architectures at room temperature using a bunch-crossing clock frequency of 10 MHz. The measured ENC is approximately 58 e\(^{-}\), and the threshold dispersion \(\sigma_{Q_{\mathrm{th}}}\) ranges from 90 e\(^{-}\) to 161 e\(^{-}\), compared with the \(\sim\)45 e\(^{-}\) predicted by simulations. Measurements were repeated and confirmed across multiple chips and test stands. These results motivate the following design actions:

Charge injection: Replace the global injection line with in-pixel charge injection to eliminate pulse-loading effects and routing-parasitic artifacts across the matrix.
Sampling linearity: Mitigate the \(\sim\)50 e\(^{-}\) sampling-phase charge error observed at low thresholds by increasing the auto-zero capacitance by a factor of ten to restore linearity margin.
Threshold-bias leakage (SP2): Address the \(\sim\)100 nA per-pixel leakage on each threshold line that induces IR-drop-related dispersion; consequently, the SP2 architecture will not be pursued further.

Despite these analog non-idealities, the front end remains stable and reproducible for calibrated pulse injection and S-curve analysis, enabling reliable evaluation of the downstream digital processing under realistic noise and dispersion conditions. Future revisions will incorporate the improvements listed above to align the analog performance with simulated expectations. The next phase of the characterization campaign will also focus on measurements at \(-30^{\circ}\) C to evaluate temperature dependence followed by testing with a bunch crossing clock of 40 MHz.

5 Performance of digital on-chip neural network↩︎

The characterization of the AFE enables reliable charge injection into the pixels. This capability is used to load pixels with cluster patterns corresponding to charge profiles from the CMS training dataset [26], [27]. Charge is injected by configuring the capacitance at each individual pixel site, after which it propagates through the AFE and is summed to form the \(y\)-profile of the cluster charge profile. The summed profile is then passed to the on-chip NN embedded in the digital fabric of the chip. Through the DAQ system, the NN weights and biases are programmed to those created in [14].

The performance of the on-chip NN is evaluated through repeated pattern pulsing. The digital logic operates combinatorially and refreshes every clock cycle, continually processing the outputs of the analog front-end. After processing by the DNN, the output is sent to an amplifier–discriminator stage to generate binary outputs. Those outputs represent a prediction of whether the injected charge profile was created by a high , low positively charged, or low negatively charged particle. The \(y\)-profile input and the corresponding DNN prediction are read out on every event or clock cycle and analyzed offline.

Due to charge injection imperfections and noise from both the chip and the electronics setup, the injected \(y\)-profiles delivered to the DNN can deviate from the intended simulated profiles. Such deviations may lead to discrepancies in the predicted particle \(p_{\mathrm{T}}\) class. Despite this, the fidelity of the on-chip DNN can be quantified by passing the read out y-profile to an offline RTL simulation of the DNN and comparing it with the corresponding readout of the DNN from the chip. Out of 10,000 test vectors passed to the DNN at a [80, 160, 320] mV threshold on the three bits, we measure a 99.86% match between the RTL simulation and the readout from the ROIC. This measurement provides high confidence that the AFE is properly propagating the signals to the digital logic, the DAQ is successfully reading out the data, and the offline analysis of the readout is correct. We also note that the fraction of correctly matched DNN output with the RTL simulation results has been observed to result in O(10%) drop if the discriminator in the read-out cables is not well-tuned.

The selection efficiency of the NN filter output is measured for the offline full precision model (Keras), offline quantized model (QKeras), and on-chip models (ASIC). Here selection efficiency refers to the ratio of events predicted to be high . The result is shown in Figure 16. The measurement is performed for two different noise thresholds in order to assess the impact of the on-chip noise on the selection efficiency of Keras, QKeras, and ASIC results. Good agreement is seen between those measurements at the high charge threshold. An asymmetry is measured between the positive and negatively charged particles (indicated as signed ) that grows as the charge threshold increases. This is understood physically since in simulation the negatively charged particles produce charge profiles which are typically broader than those of positively charged particles as shown in Figure 4 of [14]. Therefore, when the noise threshold is increased the lower charge tails of the profile are removed, and the overall clusters loses shape. This behavior reduces performance more for negatively charged particles than for positively charged ones. The ASIC results at a 400 electron noise threshold deviates from the model and has been understood to arise from excessive noise levels. Increasing the threshold to 1000 electrons significantly improves agreement with model by filtering most noise. However, for positive p\(_T\) \(>\) 0.2 GeV, performance remains below expectation, as residual noise at this threshold still inflates cluster sizes, leading the model to misclassify them as low p\(_T\) positive particles.

The measurements are converted to performance metrics (signal efficiency, background rejection, and data reduction) relevant for physics gains. While the signal efficiency is defined as the fraction of clusters with > 2 GeV that are classified as high , the background reduction is defined as the fraction of clusters with < 2 GeV that are classified as low . The overall data reduction is defined as the ratio of events classified as low to the total dataset size irrespective of true class. These quantities are measured for different noise thresholds and are summarized in Table 2 alongside the QKeras performance as well. The target signal efficiency and data reduction for the on-chip NN are 90\(\%\) and 50\(\%\), respectively, which would reproduce the offline performance studies in [14].

Table 2: Performance results of the QKeras and On-chip models for various threshold sets. The V\(_{\mathrm{th0}}\), V\(_{\mathrm{th1}}\), and V\(_{\mathrm{th2}}\) thresholds for the On-chip results correspond to the quantization thresholds on evaluation datasets for the QKeras results.
Model	Threshold [e-]	Signal efficiency	Data reduction	Background rejection
QKeras	[400, 1600, 2400]	93.72	41.60	41.60
On-chip	[400, 1600, 2400]	50.91	52.38	52.38
QKeras	[1000, 1600, 2400]	91.56	44.57	44.57
On-chip	[1000, 1600, 2400]	78.91	45.37	45.42

We observe promising results from the ASIC and that increasing the V\(_\text{th}\) threshold improves the overall performance of the chip. However, since the current models were trained on simulation data with no noise information, a non-negligible discrepancy remains between the model and ASIC results. Preliminary experimental studies indicate that retraining the models on noise-injected simulation data significantly improves ASIC performance. Ongoing efforts focus on developing more accurate noise models for simulation datasets and algorithms that are inherently robust to noise. We also expect improved performance at the nominal noise threshold (400 electrons) in the next ASIC implementation, following the design improvements discussed in Section 4.4.3.

6 Conclusion↩︎

In this work, we have presented the first physical demonstration of a 28 nm TSMC ROIC capable of performing on-chip signal processing and ML-based data filtering. The analog pixels exhibit an ENC of \(58 e^{-}\) and a threshold dispersion \(Q_{\mathrm{TH}}\) of \(90\), compared to the \(45 e^{-}\) predicted by simulations. Several design issues were identified that will be addressed in the next implementation of the ROIC, including initial charge errors, threshold line leakage in the SP2 architecture, and non-linear behavior at low thresholds. Those limitations did not inhibit reliably injecting charge profiles to characterize the behavior of the on-chip digital NN. The NN-based filter was successfully tested across offline full-precision Keras, quantized QKeras, and on-chip ASIC implementations. Measurements show good agreement at high charge thresholds, with performance asymmetries between positive and negatively charged particles, explained by the broader charge profiles of the latter. The measurements are expressed in terms of signal efficiency, background rejection, and overall data reduction to illustrate the impact of increasing noise thresholds on key performance metrics. Future work will focus on increasing the test rate from 10 MHz to 40 MHz, performing measurements at cold temperatures, implementing ASIC design updates for the next prototype, and thoroughly validating ML retraining to improve on-chip NN performance. In conclusion, this work demonstrates the in-pixel integration of signal processing and AI/ML based data filtering for particle tracking detector applications. The results mark a significant step towards unlocking the ability for high-rate ML-based readout of silicon pixel detectors in radiation intense environments. These advances motivate continued R&D toward deploying the technology in the HL-LHC and future experiments.

Acknowledgements↩︎

This work was completed using computing resources at the Fermilab Elastic Analysis Facility (EAF). We thank Burt Holzman for computing support. We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. We would like to extend our sincere gratitude to Harish Jamakhandi and David Burnette from Siemens EDA for their assistance and expertise with Catapult HLS. DB, GDG, FF, LG, RL, BP, GP, CS and NT are supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the Department of Energy (DOE), Office of Science, Office of High Energy Physics. JD, FF, GDG, BP, GP, and NT are also supported by the DOE Early Career Research Program. NT is also supported by the DOE Office of Science, Office of Advanced Scientific Computing Research under the “Real-time Data Reduction Codesign at the Extreme Edge for Science” Project (DE-FOA-0002501). AB is supported through NSF-PHY award 2013007. MS is supported by NSF-PHY award 2012584. CM is supported by NSF-PHY award 2208803. KD is supported in part by the Neubauer Family Foundation Program for Assistant Professors and the University of Chicago. RK is supported by the Metcalf Fellowship program of the University of Chicago. MSN is supported through NSF cooperative agreement OAC-2117997 and the DOE Office of Science, Office of High Energy Physics, under Contract No. DE-SC0023365. AB is supported by the Schmidt Sciences Foundation.

References↩︎

[1]

A. Affolder et al., Solid State Detectors and Tracking for Snowmass, https://arxiv.org/abs/2209.03607.

[2]

M. Garcia-Sciveres and N. Wermes, A review of advances in pixel detectors for experiments with high rate and radiation, https://doi.org/10.1088/1361-6633/aab064[https://arxiv.org/abs/1705.10150].

[3]

B. Fleming, I. Shipsey, M. Demarteau, J. Fast, S. Golwala, Y.-K. Kim et al., Basic research needs for high energy physics detector research & development: Report of the office of science workshop on basic research needs for hep detector research and development: December 11-14, 2019, Tech. Rep. https://www.osti.gov/biblio/1659761, USDOE Office of Science (SC) (United States) (12, 2019), https://doi.org/10.2172/1659761.

[4]

U.O. of Science (SC) (United States), Basic research needs for microelectronics, Tech. Rep. https://www.osti.gov/biblio/1545772, USDOE Office of Science (SC) (United States) (10, 2018), https://doi.org/10.2172/1545772.

[5]

L. Collaboration, LHCb VELO Upgrade Technical Design Report, Tech. Rep. https://cds.cern.ch/record/1624070(2013), https://doi.org/10.17181/CERN.4DGI.MZN4.

[6]

S.T. et al., The cdf silicon vertex detector, https://doi.org/https://doi.org/10.1016/0168-9002(94)91434-6.

[7]

D.C. et al., The new aleph silicon vertex detector, https://doi.org/https://doi.org/10.1016/S0168-9002(97)91255-9.

[8]

ATLAS collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, https://doi.org/10.1088/1748-0221/3/08/S08003.

[9]

CMS collaboration, The CMS experiment at the CERN LHC. The Compact Muon Solenoid experiment, https://doi.org/10.1088/1748-0221/3/08/S08004.

[10]

ATLAS collaboration, Technical Design Report for the ATLAS Inner Tracker Pixel Detector, Tech. Rep. https://cds.cern.ch/record/2285585, CERN, Geneva (2017), https://doi.org/10.17181/CERN.FOZZ.ZP3Q.

[11]

A.D. et al., CMS Technical Design Report for the Pixel Detector Upgrade, Tech. Rep. https://cds.cern.ch/record/1481838(2012).

[12]

RD53 collaboration, RD53B Design Requirements, Tech. Rep. https://cds.cern.ch/record/2663161, CERN, Geneva (2019).

[13]

D. Contardo, M. Klute, J. Mans, L. Silvestris and J. Butler, Technical Proposal for the Phase-II Upgrade of the CMS Detector, Tech. Rep. https://cds.cern.ch/record/2020886, Geneva (2015), https://doi.org/10.17181/CERN.VU8I.D59J.

[14]

J. Yoo et al., Smart pixel sensors: towards on-sensor filtering of pixel clusters with deep learning, https://doi.org/10.1088/2632-2153/ad6a00[https://arxiv.org/abs/2310.02474].

[15]

B. Parpillon, A. Trivedi and F. Fahim, Readout IC with 40 MSPS in-pixel ADC for future vertex detector upgrades of Large Hadron Collider, https://doi.org/10.1109/ISCAS46773.2023.10182033.

[16]

B. Parpillon, “Radiation-hard smart-pixel detector asic readout with digital ai in 28 nm.” 2024.

[17]

B. Parpillon, C. Syal, J. Yoo, J. Dickinson, M. Swartz, G.D. Guglielmo et al., Smart pixels: In-pixel ai for on-sensor data filtering, 2024.

[18]

M. Swartz and J. Dickinson, Smart pixel dataset, Mar., 2024. 10.5281/zenodo.10783560.

[19]

F. Fahim et al., hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices, in tinyML Research Symposium 2021, 3, 2021 [https://arxiv.org/abs/2103.05579].

[20]

C. et al., Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors, Nature Machine Intelligence3 (2021) 675.

[21]

C.N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T.K. Aarrestad et al., Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors, https://doi.org/10.1038/s42256-021-00356-5.

[22]

Siemens, “Catapult HLS.” https://eda.sw.siemens.com/en-US/ic/ic-design/high-level-synthesis-and-verification-platfor.

[23]

T. Vanat, Caribou — A versatile data acquisition system, https://doi.org/10.22323/1.370.0100.

[24]

A. Quinn, An Open-Source Framework for Rapid Validation of Scientific ASICs, 6, 2024 [https://arxiv.org/abs/2406.15181].

[25]

Y. Otarid, M. Benoit, E. Buschmann, H. Chen, D. Dannheim, T. Koffas et al., Peary: Caribou daq framework, 2025.

[26]

M. Swartz and J. Dickinson, Smart pixel dataset, Nov., 2022. 10.5281/zenodo.7331128.

[27]

M. Swartz, A Detailed Simulation of the CMS Pixel Sensor, Tech. Rep. https://cds.cern.ch/record/687440, CERN, Geneva (2002).

In-pixel integration of signal processing and AI/ML based data filtering for particle tracking detectors