April 02, 2024

Neuromorphic computing leverages the sparsity of temporal data to reduce processing energy by activating a small subset of neurons and synapses at each time step. When deployed for split computing in edge-based systems, remote neuromorphic processing units (NPUs) can reduce the communication power budget by communicating asynchronously using sparse impulse radio (IR) waveforms. This way, the input signal sparsity translates directly into energy savings both in terms of computation and communication. However, with IR transmission, the main contributor to the overall energy consumption remains the power required to maintain the main radio on. This work proposes a novel architecture that integrates a wake-up radio mechanism within a split computing system consisting of remote, wirelessly connected, NPUs. A key challenge in the design of a wake-up radio-based neuromorphic split computing system is the selection of thresholds for sensing, wake-up signal detection, and decision making. To address this problem, as a second contribution, this work proposes a novel methodology that leverages the use of a digital twin (DT), i.e., a simulator, of the physical system, coupled with a sequential statistical testing approach known as Learn Then Test (LTT) to provide theoretical reliability guarantees. The proposed DT-LTT methodology is broadly applicable to other design problems, and is showcased here for neuromorphic communications. Experimental results validate the design and the analysis, confirming the theoretical reliability guarantees and illustrating trade-offs among reliability, energy consumption, and informativeness of the decisions.

Neuromorphic computing, spiking neural networks, wake-up radios, neuromorphic wireless communications, reliability.

Neuromorphic processing units (NPUs), such as Intel’s Loihi or BrainChip’s Akida, leverage the sparsity of temporal data to reduce processing energy by activating a small subset of neurons and synapses at each time step [1], [2]. This mechanism implements *spike*-based signaling,
whereby information is exchanged in the timing of the synaptic activation. The opportunistic activation of neurons and synapses distinguishes NPUs from conventional deep learning accelerators such as graphical processing units (GPUs) or tensor processing
units (TPUs), making NPUs particularly attractive for time-series data.

As illustrated in Fig. 1, when deployed for *split computing* in edge-based systems [3], [4], remote NPUs, each carrying out part of the computation, can reduce the communication power budget by communicating asynchronously using sparse *impulse radio* (IR) waveforms [5]–[7]. This way, the input signal sparsity, which depends on the
semantics of the information processing task, translates directly into energy savings both in terms of computation and communication. However, the power savings are limited to the transmitter’s side, which can transmit impulsive waveforms only at the times
of synaptic activations. However, the main contributor to the overall energy consumption remains the power required to maintain the main radio on [8]–[10]. To address this architectural problem, as seen in Fig. 1, this work proposes a novel architecture that integrates a *wake-up radio* mechanism within a
split computing system consisting of remote, wirelessly connected, NPUs.

Wake-up radios introduce a low-cost radio at the transmitter and at the receiver. The wake-up transmitter monitors the sensed signals, deciding when to transmit a *wake-up signal* (WUS) to the receiver. The wake-up receiver operates at a much
reduced power as compared to the main receiver radio, and its sole purpose is detecting the WUS. Upon detection of the WUS, the main radio is activated [8], [9], [11]–[13].

A key challenge in the design of a wake-up radios is the selection of thresholds for sensing and WUS detection, and decision making. A conventional solution would be to calibrate the thresholds via *on-air* testing, trying out different
thresholds via testing on the actual physical system. On-air calibration would be expensive in terms of spectral resources, and there is generally no guarantee that the selected thresholds would provide desirable performance levels for the end
application.

To address this design problem, as illustrated in Fig. 2, this paper proposes a novel methodology that leverages the use of a *digital twin*, i.e., a simulator, of the physical system, coupled with a sequential statistical testing approach that
provides theoretical reliability guarantees [14], [15].

*Neuromorphic communications*: Neuromorphic communication, introduced in [5], integrates event-driven principles from
neuromorphic computing into wireless communication systems for efficient sensing, communications, and decision-making. Reference [6] presented an
architecture for wireless cognition that incorporates neuromorphic sensing, processing, and IR communications for multiple devices, leveraging time hopping for asynchronous multi-access. Motivated by the potential of IR for radar sensing [16], reference [7] introduced a neuromorphic
integrated sensing and communication system, which targets simultaneous data transmission and target detection. In [17], a neuromorphic
computing-based detector was implemented at a satellite receiver, whose goal was to detect Internet-of-Things signals in the presence of significant uplink interference. A hardware implementations of the system introduced in [6] was detailed in [18], showing the potential of the
approach to scale to thousands of nodes. The solution presented in [18] leveraged energy harvesting.

Decentralized implementations of NPUs were studied in [19], while assuming conventional digital communications. A corresponding optimal resource allocation problem was investigated in [20].

*Wake-up radio*: Wake-up radios can reduce energy consumption in wireless communication systems by keeping the main receiver radio off until an incoming signal of interest is detected [8]. In 3GPP Release 18, two wake-up receiver (WUR) architectures are introduced, using either a radio frequency envelope detector or an on-chip local oscillator approach [11]. The first type of architecture is characterized by low complexity, low cost, and extremely low energy consumption. In contrast, the second architecture
requires more complex components, like on-chip local oscillators. This results in higher energy consumption, but the benefits include better sensitivity and robustness to interferers.

For the design of WUS, two main candidates in 3GPP Release 18 are on-off keying (OOK)-based WUS and OFDM-based WUS [11]. The OFDM-based signal structure does not require significant changes on the transmitter, while OOK-based WUS is an attractive choice for receivers with low complexity.

Wake-up radios have been integrated into a number of wireless systems. For example, in [21], a multi-access protocol was introduced that facilitates fully asynchronous communication among network devices, while reference [22] focused on WURs for wireless local area networks. Reference [9] proposed a neuromorphic enhanced WUR, tailored for brain-inspired applications using OOK-modulated WUSs.

*Digital twins for wireless communication*: Digital twinning is currently viewed as a promising enabling tool for the design and monitoring of next-generation wireless systems implementing machine learning modules [23]. For example, reference [24] proposed a
Bayesian framework for the development of a DT platform aimed at the control, monitoring, and analysis of a multi-access communication system. The papers [25] and [26] proposed the use of digital twinning for the design of beam prediction and
localization, respectively.

*Guaranteed reliability for machine learning in wireless communications*: Conformal prediction (CP) uses past experience to determine precise levels of confidence in new predictions [27]. This approach guarantees that, with a specified confidence level, future predictions will fall within the prediction regions, thereby providing reliable estimates of uncertainty. For the
application of CP to wireless communication, [28] applied CP to the design of AI for communication systems in conjunction with both
frequentist and Bayesian learning, focusing on the key tasks of demodulation, modulation classification, and channel prediction.

*Learn then Test* (LTT) is a framework for the selection of hyperparameters in pre-trained machine learning models that satisfy finite-sample statistical guarantees [15]. Like CP and CRC, it relies on the use of calibration data, but it does not require the monotonicity assumption of CRC. As a result, it applies to more general settings, such as
problems with multiple hyperparameters. Being a generic framework, LTT requires a dedicated effort to be tailored to a specific problem setting. To the best of our knowledge, ours is the first work that proposes a methodology for the application of LTT to
the design of communications system.

The contribution of this paper is twofold. First, as shown in Fig. 1, we introduce a low-power wake-up radio aided neuromorphic wireless split computing architecture, whose goal is to carry out a remote inference task in an energy efficient way. Second, we propose a novel design methodology that combines LTT with digital twinning. This methodology, dubbed DT-LTT, enhances the spectral efficiency of a direct application of LTT [15] via a digital twin-based pre-selection of candidate thresholds for sensing, detection, and decision making. The main contributions of this paper are summarized as follows.

*Architecture*: We introduce a wake-up radio aided neuromorphic wireless split computing architecture, which combines the energy savings resulting from event-driven computing at the transmitter and receiver, as well as from IR transmission, with
the energy savings made possible at the receiver via the introduction of a WUR.

As illustrated in Fig. 1, in the proposed architecture, the NPU at the transmitter side remains idle until a signal of interest is detected by the signal detection module. Subsequently, a WUS is transmitted by the wake-up transmitter over the channel to the wake-up receiver, which activates the main receiver. The IR transmitter modulates the encoded signals from the NPU, and sends them to the main receiver. The NPU at the receiver side then decodes the received signals and make an inference decision.

*Digital twin-aided design methodology with reliability guarantees*: In order to select the thresholds used at transmitter and receiver for sensing, WUS detection, and decision making, we propose a novel design methodology that integrates the LTT
framework [15] with digital twinning. The proposed methodology, dubbed DT-LTT, is of broader interest as it can be applied to any
communication system requiring the selection of hyperparameters via on-air transmission.

To explain, consider any setting that requires the selection of hyperparameters affecting the operation of a wireless link, here the mentioned thresholds. A direct application of LTT [15] would sequentially test candidate hyperparameters via the estimation of the target performance metrics through transmissions on the wireless channel. This way, the designer would be limited to testing a few candidate hyperparameters, given the limited availability of spectral resources.

To reduce the spectral overhead caused by hyperparameter calibration, we propose executing LTT through digital twinning. Specifically, the digital twin is leveraged to pre-select a sequence of hyperparameters to be tested using on-air calibration via LTT. The proposed DT-LTT calibration procedure is proved to guarantee reliability of the receiver’s decisions irrespective of the fidelity of digital twin and of the data distribution. Indeed, the fidelity of the digital twin only affects the energy consumption and the informativeness of the output produced by the calibrated system. In this regard, the proposed method also supports the optimization of a weighted criterion involving energy consumption and informativeness of the receiver’s decision.

*Numerical evaluations*: Extensive numerical results are provided that demonstrate the advantages of the proposed digital twin-based design approach.

The remainder of the paper is organized as follows. Section 2 presents the system model for the proposed wake-up radios assisted neuromorphic split computing system. Section 3 describes the neuromorphic receiver processing with wake-up radio and the problem of interest, while the reliable hyperparameters optimization algorithm is proposed in Section 4. Experimental setting and results are described in Section 5. Finally, Section 6 concludes the paper.

As shown in Fig. 1, we consider an end-to-end neuromorphic remote inference system, in which the Rx collects information from a device in order to carry out a semantic task, such as segmentation.

At the device, also referred to as transmitter (Tx), the sensor monitors the environment continuously to detect the start of a signal of interest. When the Tx detects a semantically relevant signal, the wake-up Tx is turned on to transmit the wake-up signal (WUS), and the encoding neural processing unit (NPU) is also activated to process the input signal. The output of the NPU is buffered, and subsequently modulated and transmitted by the IR Tx after a given delay. Upon detecting the WUS, the wake-up receiver (Rx) activates the main Rx, which starts receiving after a given delay. The received signal is then processed by a decoding NPU, which produces a final decision.

In this way, the proposed architecture combines the energy savings resulting from event-driven computing at Tx and Rx, as well as from IR transmission, with the energy savings made possible at the Rx via the introduction of a WUR.

We assume that the relevant discrete-time signal captured by the sensor has a duration of \(L^{\rm sig}\) samples, with each sample \(\boldsymbol{ u }_l\) being a \(D\)-dimensional vector. The duration \(L^{\rm sig}\) is assumed to be known and deterministic. The signal of interest is semantically associated with label information \(c\). We assume that the labels take values in a finite discrete set, but extensions to continuous quantities are direct. Furthermore, the signal is produced by an information source after a random delay of \(l^{\rm start}\) time instants. Specifically, during an initial random period of \(l^{\text{start}}-1\) samples, the device observes a signal containing semantically irrelevant information, e.g., noise. The samples of the signal of interest is presented to the device starting at time \(l^{\rm start}\). Subsequently, the device again records irrelevant signals.

The sensor is active for a period of time equal to \(L^{\rm max} \geq L^{\rm sig}\) samples. The choice of \(L^{\rm max}\) entails a trade-off between energy consumption and probability of fully observing the signal of interest of duration \(L^{\rm sig}\).

The sensed samples \(\boldsymbol{ u }_l\) for \(l=1,2,\ldots\), are processed continuously by a *signal detector* at the Tx to determine an estimate \(\hat{l}^{\rm start}\) of the time \(l^{\rm start}\). As an example, if one assumes the availability of distributions \(f_n\) and \(f_s\) for the irrelevant signal and for the signal of interest, respectively, the well-known change detection algorithm QUSUM [29],
updates a cumulative sum statistic \(S_l\) at each time \(l\) as \[\begin{align} S_l = \max \bigg[0, S_{l-1} + \log
\bigg(\frac{f_s(\boldsymbol{ u }_l)}{f_n(\boldsymbol{ u }_l)}\bigg) \bigg], \label{qusum}
\end{align}\tag{1}\] where \(\log \big(f_s(\boldsymbol{ u }_l)/f_n(\boldsymbol{ u }_l) \big)\) denotes the log-likelihood ratio between the distributions of the signal-of-interest and of the irrelevant signal
based on the observed data \(\boldsymbol{ u }_l\) at time \(l\), with \(S_0=0\). A change is detected at time \(l\) if the
statistics \(S_l\) exceeds a threshold \(\lambda^{\rm s}\), i.e., \(S_l > \lambda^{\rm s}\). The QUSUM algorithm 1 is known to be
optimal in the sense that it solves the problem of minimizing the worst-case average detection delay over change point and past observations under a false alarm rate constraint. As a result, the wake-up Tx and encoding NPU are activated at time
\[\begin{align} \hat{l}^{\rm start} = \min_{l\in\{1,\ldots, L^{\rm max}\}} \{S_l > \lambda^{\rm s}\}, \label{wakeup}
\end{align}\tag{2}\] where the threshold \(\lambda^{\rm s}\) is subject to optimization.

Upon activation of the wake-up Tx at time \(\hat{l}^{\rm start}\) in 2 , an OOK-based WUS is transmitted for duration of \(L^{\rm w}\) time steps. Following standard practice [8], as shown in Fig. 3 (top panel), data is then transmitted \(L^{\rm d}\) time steps after the end of the WUS by IR Tx. The delay \(L^{\rm d}\) accommodates channel delay spread, detection time of the wake-up Rx, as well as the wake-up latency of the main Rx [8].

The encoding NPU processes samples \(\boldsymbol{ u }_l\) starting from time \(\hat{l}^{\rm start}\). For each time instant \(l \in [\hat{l}^{\rm start}, L^{\rm max}]=\hat{l}^{\rm start}, \hat{l}^{\rm start}+1, \ldots, L^{\rm max}\), the encoding NPU produces an \(N^{\rm T} \times 1\) vector \[\begin{align} \boldsymbol{ x }_l=f_{\scalebox{0.7}{\boldsymbol{ \theta }^e}}(\boldsymbol{ u }_l) \label{spike} \end{align}\tag{3}\] from its \(N^{\rm T}\) readout neurons. In 3 , the vector \(\boldsymbol{ \theta }^e\) is the parameter vector of the encoding NPU. The output spiking vectors \(\boldsymbol{ x }_l\) for \(l \in [\hat{l}^{\rm start}, L^{\rm max}]\) are buffered and transmitted in a first-in-first-out manner starting at time \(\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}\), i.e., after the transmission of the WUS and the delay \(L^{\rm d}\).

We consider a multi-path frequency-selective channel model to capture the wireless propagation environment. This model accounts for the multipath propagation phenomenon, where signals transmitted from a transmitter encounter multiple reflections, diffractions, and scattering from objects in the surrounding environment before reaching the receiver. As a result, the received signal is composed of a combination of delayed and attenuated versions of the transmitted signal, each experiencing different propagation conditions. Mathematically, we describe the channel impulse response \(h(t)\) as a sum of complex-valued path contributions: \[\begin{align} h(t) = \sum_{i=1}^{N^{\rm P}} a_i \delta(t - \tau_i), \end{align}\] where \(N^{\rm P}\) is the total number of paths, \(a_i\) represents the complex attenuation (magnitude and phase) of the \(i\)th path, and \(\tau_i\) denotes the delay of the \(i\)th path.

The wake-up Tx is equipped with one antenna, while the IR transmitter has \(N^{\rm T}\) antennas. Both transmitters adopts IR to modulate their respective transmitted signal \(s_w(t)\) and \(\{s_i(t)\}_{i=1}^{N^{\rm T}}\). Note that this is not a requirement for the wake-up radio, and is assumed here to facilitate a low-complexity implementation. Bandwidth expansion, leveraging time hopping (TH) [10], is utilized to manage interference between antennas of the transmitting device during data transmission.

Accordingly, each time step \(l\) of the sensed signal \(\boldsymbol{ u }_l\) comprises \(L^{\rm b} \geq 1\) chips on the radio channel, with each chip
having a duration of \(T_c\) seconds. Consequently, each time step \(l\) spans \(L^{\rm b}T_c\) seconds, hence \(L^{\rm b}\)
is referred to as *bandwidth expansion factor*. The bandwidth expansion factor \(L^{\rm b}\) serves as a tradeoff between latency and interference mitigation. Using TH, each \(i\)th
antenna modulates the corresponding \(m\)th entry of vector \(\boldsymbol{ x }_l\) in 3 using random time shifts across the \(L^{\rm
b}\) chips of the \(l\)th time period. This introduces temporal separation to reduce interference.

To elaborate, the antenna at the wake-up Tx modulates the OOK-based WUS using IR at each time step \(l\in[\hat{l}^{\rm start}, \hat{l}^{\rm start} + L^{\rm w}-1]\). The OOK-based WUS \(s^{\rm w}(t)\) is defined as [10] \[\begin{align} s^{\rm w}(t)= \sum_{j=\hat{l}^{\rm start}}^{\hat{l}^{\rm start}+L^{\rm w}-1} x^{\rm w}_j \phi(t-jL^{\rm b}T_c), \label{twus} \end{align}\tag{4}\] where \(x^{\rm w}_j\) represents the \(j\)th OOK symbol in the set \(\{0,1\}\), and \(\phi(t)\) denotes the OOK pulse waveform with bandwidth \(1/T_c\). The WUS \(s^{\rm w}(t)\) is received over a multi-path fading channel \(h_w(t)\) by the wake-up Rx as \[\begin{align} w(t)= s^{\rm w}(t) * h_w(t) + z(t), \label{wus} \end{align}\tag{5}\] where \(*\) denotes the convolutional operation and \(z(t)\) is the white Gaussian noise with noise power \(N_0\).

As shown in Fig. 3 (a), following a pre-introduced delay of \(L^{\rm d}\) after the WUS transmission, the IR transmitter is activated. To facilitate the main receiver’s adaptation to the frequency-selective channel conditions, the IR transmitter transmits pilots prior to the data transmission. The pilot symbols sent from the \(i\)th antenna have a length of \(L^{\rm p}\) and are defined as \[\begin{align} s^{\rm p}_i(t)= \sum_{j=\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}}^{\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}+L^{\rm p}-1} \phi(t-jL^{\rm b}T_c- c^{\rm p}_{j,i}T_c), \label{pilot} \end{align}\tag{6}\] where \(c^{\rm p}_{j,i}\in\{0,1,\ldots,L^{\rm b}\}\) is an integer for the \(j\)th pilot symbol transmitted from the \(i\)th antenna, representing the TH position within \(L^{\rm b}\) chips. The pilot is transmitted over the multi-path fading channel \(h_{i,n}(t)\), and is received at the \(n\)th receive antenna as \[\begin{align} v^{\rm p}_n(t)= \sum_{i=1}^{N^{\rm T}} s_i^{\rm p}(t) * h_{i,n}(t) + z_n(t), \label{rpilot} \end{align}\tag{7}\] where \(z_n(t)\) represents the white Gaussian noise at the \(n\)th receive antenna.

Data transmission commences once all pilot symbols have been transmitted. Each \(i\)th antenna at the IR transmitter modulates entry \(x_{l,i}\) of the vector \(\boldsymbol{ x }_l=(x_{l,1}, \ldots, x_{l,N^{\rm T}})^T\) in 3 at time \(l\in[\hat{l}^{\rm start} + L^{\rm w} + L^{\rm d}, \ldots, L^{\rm max}]\), into a continuous-time signal \(s_{i}(t)\), e.g., using Gaussian monopulses, and TH as \[\begin{align} s_i(t)= \sum_{j=\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}+L^{\rm p}}^{L^{\rm max}} x_{j,i} \cdot \phi(t-jL^{\rm b}T_c- c_{j,i}T_c), \end{align}\] where \(c_{j,i}\) is a random integer between \(0\) and \(L^{\rm b}-1\), representing TH position for the \(i\)th antenna at the \(j\)th time step.

The modulated signal \(s_{i}(t)\) is then transmitted over the multi-path fading channel \(h_{i,n}(t)\) to the Rx, where the received signal at the \(n\)th receive antenna is obtained as the superposition \[\begin{align} v_n(t)= \sum_{i=1}^{N^{\rm T}}s_{i}(t) * h_{i,n}(t) + z_n(t). \label{transmission} \end{align}\tag{8}\] Note that this assume the delay \(L^{\rm d}\) to be longer than the channel spread to avoid interference with the WUS.

To save energy at the Rx, instead of keeping the main radio on continuously, the proposed system incorporates an ultra low-power wake-up Rx that monitors the ambient radio frequency (RF) environment and listens for the WUS via the received signal 5 . This approach allows the Rx to remain in a low-power state for extended periods, activating the main radio only when a WUS is detected. In this section, we start by introducing the WUS detection process operated by the wake-up Rx, and then we describe how the main Rx operates after it has been activated. Finally, we mathematically formulate the design problem of interest, which consists of minimizing the main Rx power consumption while guaranteeing the desired level of reliability for the decision made at the Rx.

The wake-up Rx is always on, and it applies a correlator to detect the WUS \(s_w(t)\) in 4 from the received signal \(w(t)\) in 5. This is done via matched filtering, i.e., by evaluating the convolution \[\begin{align} d(\tau) = \int_{-\infty}^{+\infty} w(t)s^*_w(t-\tau) dt, \label{match} \end{align}\tag{9}\] and by detecting the WUS at time \(\tau\) if the absolute value of the matched filter output \(d(\tau)\) in 9 is larger than some threshold \(\lambda^w\), i.e., \[\begin{align} \hat{l}^{\rm det}= \min_{l\in [1,\ldots, L^{\rm max}]} \{|d(lL^{\rm b}T_c)| \geq \lambda^{\rm w}\}, \label{lambdaw} \end{align}\tag{10}\] with threshold \(\lambda^{\rm w}\) being subject to optimization. As a result, the wake-up time of the main Rx is given by \(\hat{l}^{\rm det} + \delta^{\rm wake}\), where \(\delta^{\rm wake} \leq L^{\rm d}\) denotes the time required by the main Rx to be turned on upon the reception of WUS.

The main Rx does not miss the start of the data packet (see Fig. 3 (b)) as long as we have the inequality \[\begin{align} \hat{l}^{\rm det} + \delta^{\rm wake} \leq \hat{l}^{\rm start} + L^{\rm w} + L^{\rm d}. \label{Ld} \end{align}\tag{11}\] Otherwise, the wake-up Rx misses at least some of the transmitted samples (Fig. 3 (c)).

The main radio is equipped with \(N^{\rm R}\) antennas, and it stays idle until time \(\hat{l}^{\rm det}+ \delta^{\rm wake}\). Upon waking up, the main receiver samples the received pilot signals \(\{v^{\rm p}_n(t)\}_{n=1}^{N^{\rm R}}\) and the received data signals \(\{v_n(t)\}_{n=1}^{N^{\rm R}}\) at each time \(l\), obtaining discrete-time pilots \(\boldsymbol{ v }^{\rm p}_l =[\boldsymbol{ v }^{\rm p}_{l,1},\ldots, \boldsymbol{ v }^{\rm p}_{l,N^{\rm R}}]\) and discrete-time data \(\boldsymbol{ v }_l =[\boldsymbol{ v }_{l,1},\ldots, \boldsymbol{ v }_{l,N^{\rm R}}]\), respectively. Here, the \(n\)th element represents the collection of signals by the \(n\)th antenna for \(L^{\rm b}\) chips at time \(l\), i.e., \(\boldsymbol{ v }^{\rm p}_{l,n}=\{v^{\rm p}_{l,n,j}\}_{j\in \mathcal{I}_l}\) and \(\boldsymbol{ v }_{l,n}=\{v_{l,n,j}\}_{j\in \mathcal{I}_l}\).

A hypernetwork is a type of neural network that generates the weights for another neural network, which can enhance the adaptability of the other neural network to the channel conditions. The target network in our setting is the decoding NPU.

Provided that the main radio has woken up in time, we assume knowledge of the time of arrival of the pilots. Accordingly, we begin by collecting all the received pilot symbols as \(\boldsymbol{ v }^{\rm p}=\{\boldsymbol{ v }_l^{\rm p}\}_{l=\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}}^{\hat{l}^{\rm start}+L^{\rm w}+L^{\rm d}+L^{\rm p}-1}\). To process the received pilot \(\boldsymbol{ v }_l^{\rm p}\), we implement a pre-trained hypernetwork parameterized by \(\boldsymbol{ \psi }\), such as a deep neural network (DNN). This hypernetwork takes the pilot \(\boldsymbol{ v }^{\rm p}\) as input, and produces a vector \(\boldsymbol{ \omega }\) as \[\begin{align} \boldsymbol{ \omega }=f_{\scalebox{0.7}{\boldsymbol{ \psi }}}(\boldsymbol{ v }^{\rm p}), \end{align}\] in which each element is a scaling factor for each neuron in the decoding NPU. Effectively, the hypernetwork subsumes the task of channel estimation by directly mapping pilots to receiver’s parameters.

Specifically, the vector \(\boldsymbol{ \omega }\) is composed of \(N_d\) sub-vectors as \(\boldsymbol{ \omega }=\{\boldsymbol{ \omega }_1, \ldots, \boldsymbol{ \omega }_{N_d}\}\), where \(N_d\) is also the number of layers in the decoding NPU. Each element \(\boldsymbol{ \omega }_s\) has a length equal to the number of neurons in layer \(s\) of the decoding NPU. Thus, the weight matrix \(\tilde{\boldsymbol{ \theta }}_s^d\) for layer \(s\) in the decoding NPU can be adjusted by the hypernetwork as \[\begin{align} \boldsymbol{ \theta }_s^d= \tilde{\boldsymbol{ \theta }}_s^d \cdot \text{diag}\{\boldsymbol{ \omega }_s\}, \end{align}\] where \(\text{diag}\{\boldsymbol{ \omega }_s\}\) is a diagonal matrix with main diagonal given by the vector \(\boldsymbol{ \omega }_s\). We collect the updated weights of the decoding TPU as \(\boldsymbol{ \theta }^d=\{\boldsymbol{ \theta }^d_1, \ldots, \boldsymbol{ \theta }^d_{N_d} \}\).

The data signal \(\boldsymbol{ v }_l\) is fed to the NPU, which produces a \(C \times 1\) vector \[\begin{align} \boldsymbol{ r }_{l}=f_{\scalebox{0.7}{\boldsymbol{ \theta }^d}}(\boldsymbol{ v }_{l}) \end{align}\] via \(C\) readout neurons. At the final time \(L^{\rm max}\), the output of the decoding NPU is first processed to yield a decision variable. As a typical example, the \(C \times 1\) spike count vector \(\bar{\boldsymbol{ r }}\) is obtained by first summing up all output signal \(\{\boldsymbol{ r }_l\}_{l=\hat{l}^{\rm det}+\delta^{\rm wake}}^{L^{\rm max}}\) from the \(C\) readout neurons as \[\begin{align} \boldsymbol{ r }=\sum_{l^{\prime}=\hat{l}^{\rm wake}}^{L^{\rm max}} \boldsymbol{ r }_{l^{\prime}}. \label{count} \end{align}\tag{12}\]

Focusing on a classification problem, the decoding NPU applies softmax function to the spike count vector \(\boldsymbol{ r }\) to obtain a probability vector \(\boldsymbol{ p
}=[p_1,\ldots,p_C]\). A score is assigned to each class \(c\) using the log-loss as \(s_c=-\log(p_c)\). The final decision is constructed in the form of a *decision set* that
includes the classes whose scores are smaller than a given threshold \(\lambda^{\rm d}\), i.e., [30]
\[\begin{align} \mathcal{C} = \{c: s_c \leq \lambda^{\rm d}\}. \label{set}
\end{align}\tag{13}\] The use of a decision set supports reliable decision making, whereby the size of the decision set \(\mathcal{C}\) can be determined as a function of the uncertainty of the decision [15], [31]. This way, in contrast to standard methods
such as top-k prediction, the size \(|\mathcal{C}|\) of the set is adapted to the difficulty of the input, providing a means to control the expected loss and to quantify the uncertainty.

Overall, the decision vector \(\boldsymbol{ r }\) in 12 produced by the decoding NPU at the receiver depends on the fading channels and noise experienced by WUS transmission as per 5 and by data transmission as per 8 . Denoting collectively all noise and channel variables as \(\mathbf{h}\), we will explicitly denote the dependence of \(\mathbf{r}\) on \(\mathbf{h}\) as \(\boldsymbol{ r }_{\mathbf{h}}\). While the variables in vector \(\mathbf{h}\) cannot be controlled, the system can tune the hyperparameters \(\lambda=[\lambda^{\rm s}, \lambda^{\rm w}, \lambda^{\rm d}]\), dictating the threshold \(\lambda^{\rm s}\) for input signal detection at the Tx as in 2 ; the threshold \(\lambda^{\rm w}\) for WUS detection at the wake-up Rx as in 10 ; and the threshold \(\lambda^{\rm d}\) for prediction 13 .

As the predicted set \(\mathcal{C}\) in 13 depends on the input data \(\boldsymbol{ u }\), the channel variables \(\mathbf{h}\), and
the hyperparameter vector \(\lambda\), we will explicitly denote it as \(\mathcal{C}(\boldsymbol{ u }, \mathbf{h}, \lambda)\). To define the problem of optimizing the hyperparameters \(\lambda\), we introduce a *loss function* \(\ell(c, \mathcal{C}(\boldsymbol{ u }, \mathbf{h}, \lambda))\) capturing the discrepancy between the true target variable \(c\) and the estimate \(\mathcal{C}(\boldsymbol{ u }, \mathbf{h}, \lambda)\). The corresponding *expected loss* is defined as \[\begin{align} L(\lambda)=\mathbb{E}[\ell(c, \mathcal{C}(\boldsymbol{ u }, \mathbf{h}, \lambda))], \label{risk}
\end{align}\tag{14}\] where the expectation is taken with respect to the data distribution \(p(\boldsymbol{ u }, c)\) of the input-output pair \((\boldsymbol{ u }, c)\), as
well as over the distribution \(p(\mathbf{h})\) of the channel variables \(\mathbf{h}\).

Given pre-trained encoding and decoding NPUs, we wish to find hyperparameters \(\lambda\) that minimize the average energy consumption \(E(\lambda)\) at the Rx main radio, while controlling the expected loss \(L(\lambda)\) at some predetermined level \(\alpha \in[0,1]\). Note that the focus on energy consumption of the main radio at the Rx is justified by the fact that it is typically the most significant contributor to the overall energy expenditure at the Rx [9].

The *average energy* \(E(\lambda)\) consumed by the Rx main radio is evaluated as \[\begin{align} E(\lambda) = P^{\rm on}(L^{\rm
max}-\mathbb{E}[\hat{l}^{\rm det}(\boldsymbol{ u }, \mathbf{h}, \lambda)]- \delta^{\rm wake}+1), \label{energy}
\end{align}\tag{15}\] with \(P^{\rm on}\) being the per-time-step energy consumed by the main radio when it is on, and the expectation is computed with respect to the data distribution of the input \(\boldsymbol{ u }\) and the distribution of vector \(\mathbf{h}\). In fact, as illustrated in Fig. 3, the Rx main radio is on for \(L^{\rm max}
-\hat{l}^{\rm det}(\boldsymbol{ u }, \mathbf{h}, \lambda)- \delta^{\rm wake}+1\). The notation \(\hat{l}^{\rm det}(\boldsymbol{ u }, \mathbf{h}, \lambda)\) is introduced in 15 to highlight
the dependence of the detection time \(\hat{l}^{\rm det}\) on input \(\boldsymbol{ u }\), channel \(\mathbf{h}\), and hyperparameter \(\lambda\).

A smaller energy consumption 15 can be obtained by waking up the main radio later, i.e., by maximizing the expected value \(\mathbb{E}[\hat{l}^{\rm det}(\boldsymbol{ u }, \mathbf{h}, \lambda)]\),
but this generally comes at the cost of an increased average loss \(L(\lambda)\). To assess the informativeness of the predicted set \(\mathcal{C}\), we evaluate the *average set
size* as \[\begin{align} I(\lambda) = \mathbb{E}[|\mathcal{C}(\boldsymbol{ u }, \mathbf{h}, \lambda)|],
\end{align}\] where the expectation is taken with respect to the data distribution of the input \(\boldsymbol{ u }\) and the distribution of vector \(\mathbf{h}\).

Overall, the design problem of interest is formulated as the constrained minimization \[\begin{align} &~ \underset{\lambda}{\text{minimize}} ~ E(\lambda) + \gamma I(\lambda) \notag \\ &~ \text{subject to}~ L(\lambda) \leq \alpha, \label{problem} \end{align}\tag{16}\] where \(\gamma \geq 0\) is a weight factor determining the relative priority between the energy consumption \(E(\lambda)\) and the set size \(I(\lambda)\), while the parameter \(\alpha >0\) specifies the desired reliability level, with a smaller \(\alpha\) indicating a stricter reliability requirement. Regarding the choice of parameter \(\gamma\) in 16 , note that there is generally a tension between energy \(E(\lambda)\), and set size \(I(\lambda)\). In fact, reducing the set size \(I(\lambda)\), while maintaining the desired target reliability \(\alpha\), generally requires a larger energy expenditure \(E(\lambda)\).

As discussed in the last section, the goal of this work is to introduce a methodology for the selection of hyperparameters \(\lambda\) by addressing problem 16 . In this section, we describe the proposed solution based on digital twinning and LTT [15], a method recently introduced in statistics.

Addressing problem 16 is made complicated by the fact that we do not assume knowledge of the distribution \(p(\boldsymbol{ u }, c)\) of each data pair \((\boldsymbol{ u
}, c)\), consisting of sensed signal \(\boldsymbol{ u }\) and label \(c\), and we also do not have access to the distribution \(p(\mathbf{h})\) of the
channel variables \(\mathbf{h}\). To obtain information about the data distribution \(p(\boldsymbol{ u }, c)\), we make the common assumption that a dataset \(\mathcal{D}=\{(\boldsymbol{ u }_n, c_n)\}_{n=1}^{|\mathcal{D}|}\) is available, where each pair \((\boldsymbol{ u }_n, c_n)\) of signal \(\boldsymbol{ u }_n\)
and label \(c_n\) is generated in an independent and identically distributed (i.i.d.) manner from the distribution \(p(\boldsymbol{ u }, c)\). Note that each pair is thus produced under an
independent channel realization from distribution \(p(\mathbf{h})\). Furthermore, to facilitate the collection of information about the distribution \(p(\mathbf{h})\) of the channel
variables, we assume access to a simulator in a *digital twin* of the system. As illustrated in Fig. 2, the simulator can produce samples \(\tilde{\mathbf{h}}\) from a distribution \(\tilde{p}(\mathbf{h})\) that is generally different from the true distribution \(p(\mathbf{h})\). The *fidelity* of the simulator depends on how similar the distribution \(p(\mathbf{h})\) and \(\tilde{p}(\mathbf{h})\) are.

With this information, DT-LTT aims at solving a relaxation of problem 16 , in which the constraint is required to be satisfied with a user-determined probability \(1-\delta\) with \(\delta \in(0,1)\). The resulting problem is defined as \[\begin{align} &~ \underset{\lambda}{\text{minimize}} ~ E(\lambda) + \gamma I(\lambda) \notag \\ &~ \text{subject to}~ \Pr\big[L(\lambda) \leq \alpha\big] \geq 1-\delta, \label{eq:goal} \end{align}\tag{17}\] where the probability \(\Pr[\cdot]\) is taken with respect to the random realization of the dataset \(\mathcal{D}\) and the channel \(\mathbf{h}\). Note that the probability in 17 cannot be evaluated given that the distribution \(p(\boldsymbol{ u }, c)\) and \(p(\mathbf{h})\) are unknown.

In order to address problem 17 , we follow a two-stage approach illustrated in Fig. 2. In the first phase, the digital twin pre-selects a subset \(\Lambda\) of candidate
hyperparameter vectors \(\lambda\). The pre-selected candidates in set \(\Lambda\) are then tested in the following phase of *on-air calibration* to identify a hyperparameter vector
\(\lambda^*\) that provably satisfies the constraint in 17 . Reducing the size of the candidate solutions via the use of the digital twin supports a more efficient use of the physical channel
resources during on-air calibration, as fewer options need to be evaluated using transmission on the wireless channel.

At a technical level, as detailed in the Appendix, the proposed approach leverages the freedom in the LTT scheme to choose any fixed sequence of hyperparameter vectors for testing of the reliability condition (17 ). Our proposed method, DT-LTT, determines the sequence of hyperparameter vectors by leveraging a digital twin model.

To start, the dataset \(\mathcal{D}\) is randomly partitioned into two subsets, namely the dataset \(\mathcal{D}^{\rm DT}\) to be used with the simulator produced by the digital twin and
the dataset \(\mathcal{D}^{\rm PT}\) to be leveraged for on-air calibration in the physical system. To carry out the pre-selection of a subset \(\Lambda\) of hyperparameter, the digital twin
addresses the *multi-objective problem* \[\begin{align} \underset{\lambda}{\text{minimize}}~ \{ \hat{L}^{\rm DT}(\lambda), \hat{E}^{\rm DT}(\lambda) + \gamma \hat{I}^{\rm DT}(\lambda)\},
\label{mutio}
\end{align}\tag{18}\] where the objectives \(\hat{L}^{\rm DT}(\lambda)\), \(\hat{E}^{\rm DT}(\lambda)\) and \(\hat{I}^{\rm DT}(\lambda)\) are
empirical estimates obtained at the digital twin for the expected loss [32] \[\begin{align} \hat{L}^{\rm DT}(\lambda)=\frac{1}{|\mathcal{D}^{\rm DT}|} \sum_{n=1}^{|\mathcal{D}^{\rm DT}|} \ell\big(c, \mathcal{C}(\boldsymbol{ u }_n, \tilde{\mathbf{h}}_n, \lambda)\big), \label{erisk}
\end{align}\tag{19}\] the average energy consumption \[\begin{align} \hat{E}^{\rm DT}(\lambda)= P^{\rm on}\bigg(L^{\rm max}-\frac{1}{|\mathcal{D}^{\rm DT}|}
\sum_{n=1}^{|\mathcal{D}^{\rm DT}|} \hat{l}^{\rm det}(\boldsymbol{ u }_n, \tilde{\mathbf{h}}_n, \lambda) - \delta^{\rm wake} +1\bigg), \label{epower}
\end{align}\tag{20}\] and the average set size \[\begin{align} \hat{I}^{\rm DT}(\lambda) = \frac{1}{|\mathcal{D}^{\rm DT}|} \sum_{n=1}^{|\mathcal{D}^{\rm
DT}|}|\mathcal{C}(\boldsymbol{ u }_n, \tilde{\mathbf{h}}_n, \lambda)|. \label{esize}
\end{align}\tag{21}\]

The empirical estimates 19 , 20 and 21 are obtained by using the dataset \(\mathcal{D^{\rm DT}}\) and transmission simulated using channels \(\tilde{\mathbf{h}}_n \sim \tilde{p}(\mathbf{h})\) generated by digital twin. As shown in Fig. 4, the digital twin uses an arbitrary multi-objective optimization algorithm to identify a discrete subset \(\Lambda\) of values of the hyperparameter \(\lambda\) such that the resulting estimates \(\big(\hat{L}^{\rm DT}(\lambda), \hat{E}^{\rm DT}(\lambda)+ \gamma \hat{I}^{\rm DT}(\lambda)\big)\) lie on the Pareto front of the set of achievable values for the pair \(\big(\hat{L}^{\rm DT}(\lambda), \hat{E}^{\rm DT}(\lambda)+ \gamma \hat{I}^{\rm DT}(\lambda)\big)\). Mathematically, each vector \(\lambda\) included in the candidate set \(\Lambda\) satisfies the condition \[\begin{align} \nexists \lambda^{\prime} & ~\text{such that}~ \hat{L}^{\rm DT}(\lambda^{\prime}) < \hat{L}^{\rm DT}(\lambda) ~\text{and}~ \notag \\ &\hat{E}^{\rm DT}(\lambda^{\prime})+ \gamma \hat{I}^{\rm DT}(\lambda^{\prime}) < \hat{E}^{\rm DT}(\lambda)+ \gamma \hat{I}^{\rm DT}(\lambda) \end{align}\] that no other hyperparameter \(\lambda^{\prime}\) improves both empirical loss and empirical energy consumption plus the weighted set size.

Given the pre-selected candidate solutions in set \(\Lambda\), on-air calibration aims at selecting a value \(\lambda\) that approximately solves problem 18 , ensuring the validity of the reliability constraint in 17 . To this end, the solutions in set \(\Lambda\) are first ordered with respect to the loss value \(\hat{L}^{\rm DT}(\lambda)\) in 19 as \[\begin{align} \hat{L}^{\rm DT}(\lambda_1) \leq \hat{L}^{\rm DT}(\lambda_2) \leq \ldots \leq \hat{L}^{\rm DT}(\lambda_{|\Lambda|}). \end{align}\] On-air calibration evaluates the solutions in set \(\Lambda\) in the order \(\lambda_1, \lambda_2, \ldots\), selecting a value \(\lambda^{*}\) that is guaranteed to satisfy constraint 17 , while reducing as much as possible the weighted sum of energy consumption and set size.

For any hyperparameter \(\lambda_j\) being tested, using transmission on the actual physical channel, the physical twin evaluates empirical expected loss \[\begin{align} \hat{L}^{\rm PT}(\lambda_j)=\frac{1}{|\mathcal{D}^{\rm PT}|} \sum_{n=1}^{|\mathcal{D}^{\rm PT}|} \ell(c, \mathcal{C}(\boldsymbol{ u }_n, \mathbf{h}_n, \lambda_j)), \label{crisk} \end{align}\tag{22}\] the empirical energy consumption \[\begin{align} \hat{E}^{\rm PT}(\lambda_j)=&P^{\rm on}\bigg(L^{\rm max}-\frac{1}{|\mathcal{D}^{\rm PT}|} \sum_{n=1}^{|\mathcal{D}^{\rm PT}|} \hat{l}^{\rm det}(\boldsymbol{ u }_n, \mathbf{h}_n, \lambda_j) \notag \\ &- \delta^{\rm wake} +1\bigg) \label{cpower} \end{align}\tag{23}\] and the empirical set size \[\begin{align} \hat{I}^{\rm PT}(\lambda_j) = \frac{1}{|\mathcal{D}^{\rm PT}|} \sum_{n=1}^{|\mathcal{D}^{\rm PT}|}|\mathcal{C}(\boldsymbol{ u }_n, \mathbf{h}_n, \lambda_j)| \label{csize} \end{align}\tag{24}\] by transmitting on actual channel realizations \(\mathbf{h}_n \sim p(\mathbf{h})\). Note that the channel realization \(\mathbf{h}_n\) is not known and not required to evaluate the estimates 22 , 23 and 24 . The estimates 22 , 23 and 24 are evaluated successively for the candidate solutions \(\lambda_1,\lambda_2, \ldots\), until a stopping criterion is satisfied.

Specifically, as illustrated in Fig. 4, the evaluation of candidate solutions \(\lambda_1, \lambda_2,\ldots\) stops at the first value \(j^{\rm stop}\) for which the
estimated loss \(\hat{L}^{\rm PT}(\lambda_{j^{\rm stop}})\) in 22 exceeds the threshold \[\begin{align} \psi(\alpha, \delta) = \alpha
- \sqrt{\frac{- \ln (\delta)}{2|\mathcal{D}^{\rm PT}|}}, \label{threshold}
\end{align}\tag{25}\] which is a function of the dataset size \(|\mathcal{D}^{\rm PT}|\), of the target risk \(\alpha\) in 17 , and of the probability
bound \(\delta\) in 17 . For the optimal hyperparameter \(\lambda^*\) to be well defined, one needs to ensure the condition
\[\begin{align} \hat{L}^{\rm PT}(\lambda_1) < \psi(\alpha, \delta). \label{feasible}
\end{align}\tag{26}\] If condition 26 is not met, the decoding NPU makes a *secure* decision by including all classes in the predicted set \(\mathcal{C}\) in 13 , while saving energy by keeping the main receiver off. This amounts to the choice \(\lambda^*=[\lambda^{\rm s}=\infty, \lambda^{\rm w}= \infty, \lambda^{\rm d}=\infty]^T\).

Assuming that such value exists, finally, the selected value \(\lambda^{*}\) is obtained by choosing the value \(\lambda_j\) with \(j\in\{1,\ldots, j^{\rm stop}\}\) that returns the smallest estimated sum \(\hat{E}^{\rm PT}(\lambda_j)+\gamma \hat{I}^{\rm PT}(\lambda_j)\), i.e., \[\begin{align} \lambda^{*}= \lambda_{j^{*}}, ~\text{with}~ j^*=\mathop{\mathrm{arg\,min}}_{j\in\{1,\ldots, j^{\rm stop}\}} \{\hat{E}^{\rm PT}(\lambda_j) +\gamma \hat{I}^{\rm PT}(\lambda_j)\}. \label{riskltt} \end{align}\tag{27}\]

The overall proposed calibration procedure is described in Algorithm 5. As proved next, by the properties of LTT [15], DT-LTT guarantees the constraint 17 irrespective of the true, unknown, distributions \(p(\boldsymbol{ u }, c)\) and \(p(\mathbf{h})\), and irrespective of the fidelity of the digital twin.

**Theorem 1** (**Reliability of DT-LTT**). *By setting the hyperparameter vector \(\lambda^*\) as in Algorithm 5, DT-LTT satisfies the inequality \[\begin{align} \Pr[L(\lambda^*) \leq \alpha] \geq 1-\delta \label{theo}
\end{align}\qquad{(1)}\] holds for any realizations of dataset \(\mathcal{D}^{\rm DT}\), simulated channels \(\{\tilde{\mathbf{h}}_n \sim
\tilde{p}(\mathbf{h})\}_{n=1}^{|\mathcal{D}^{\rm DT}|}\), with probability in ?? evaluated with respect to the randomness of the dataset \(\mathcal{D}^{\rm PT}\) and the true channels \(\{\mathbf{h}_n \sim p(\mathbf{h})\}_{n=1}^{|\mathcal{D}^{\rm PT}|}\).*

*Proof.* The proof is provided in the Appendix. ◻

In this section, we present numerical results that validate the proposed design and analysis.

To test the proposed DT-LTT calibration method, we consider a neuromorphic wireless communication link over a multi-path fading channel, whose goal is to support reliable image classification at the receiver. The transmitter is equipped with \(N^{\rm T}=10\) antennas, each modulating the spiking signal produced by the corresponding neuron of the encoding NPU, while the receiver has \(N^{\rm R}=2\) antennas. The channel has \(N^{\rm P}=4\) paths, with delay of the \(i\)th path equal to the \(i\)th chip time. All antennas share the same multipath delays, and the amplitudes for each path are independent and Rayleigh distributed with equal power across all antennas. The signal-to-noise ratio (SNR) per time step is defined as the ratio of the transmission power, which is assumed to be the same for WUS, pilots, and data transmission, over the noise power. We set the SNR to 10 dB.

As in [6], the encoding NPU is a fully-connected SNN featuring one hidden layer comprising 500 neurons and an output layer with 10 neurons, while the decoding NPU is designed as an SNN with a single hidden layer containing 200 neurons and an output layer consisting of 10 neurons, each representing one of 10 classes. The hypernetwork is implemented as an ANN with a hidden layer with 500 hidden neurons.

Unless stated otherwise, the maximum observation period for each data \(\boldsymbol{ u }\) is \(L^{\rm max}=60\) time steps, with the duration for the signal of interest fixed at \(L^{\rm sig}=40\). During this period, we repetitively present an input image to be classified for 40 time steps. The initial time \(l^{\rm start}\) is determined by drawing from a discrete uniform distribution in the set \(\{1, L^{\rm max} -L^{\rm sig}\}\). Subsequently, the initial \(l^{\rm start}\) and the last \(L^{\rm max}-L^{\rm sig}-l^{\rm start}\) time samples of \(\boldsymbol{ u }\) are generated independently using a normal distribution \(\mathcal{N}(\boldsymbol{ 0 },\boldsymbol{ I })\).

To implement the QUSUM algorithm, we fit the signal of interest to a multivariate Gaussian distribution \(\mathcal{N}(\boldsymbol{ \mu }, \boldsymbol{ \Sigma })\), where the mean and covariance matrix are determined by using the training dataset \(\mathcal{D}^{\rm tr}\) as \(\boldsymbol{ \mu }= \frac{1}{|\mathcal{D}^{\rm tr}|} \sum_{\boldsymbol{ x } \in \mathcal{D}^{\rm tr}} \boldsymbol{ x }\) and \(\boldsymbol{ \Sigma }=\frac{1}{|\mathcal{D}^{\rm tr}|} \sum_{\boldsymbol{ x } \in \mathcal{D}^{\rm tr}} (\boldsymbol{ x }-\boldsymbol{ \mu })(\boldsymbol{ x }-\boldsymbol{ \mu })^T\).

For IR transmission, the duration of the WUS is set to \(L^{\rm w}=2\), and the duration for the pilot is also set to \(L^{\rm p}=2\). The delay added by the transmitter is \(L^{\rm d}=3\) time steps, and the wake-up time \(\delta^{\rm wake}=2\). The power for keeping the main radio on is set to a normalized value \(P^{\rm on}=1\).

Decision are made via set prediction as in 13 , and the loss function \(\ell(c, \mathcal{C})\) is a 0-1 loss that indicates whether the true label \(c\) is included
in the predicted set \(\mathcal{C}\) or not, i.e., \(\ell(c, \mathcal{C})=\mathbb{1}(c \notin \mathcal{C})\), where \(\mathbb{1}(\cdot)\) is an indicator
function. Accordingly, the average loss represents the *probability of miscoverage* for the decision set \(\mathcal{C}\). To evaluate the *informativeness* of the set prediction, we also compute the normalized
average set size \(|\mathcal{C}|/C\) of the prediction set [33].

Since our focus is on the optimization of the thresholds, rather than on training, we adopt *pre-trained* SNNs. Specifically, pre-training, testing, and calibration use the MNIST dataset, which comprises 60,000 training samples and 10,000 test
samples. Each sample in the dataset represents a handwritten digit ranging from 0 to 9, and is presented as a \(28\times 28\) pixel image. We partition the training dataset by drawing \(6,000\) samples for the dataset \(\mathcal{D}^{\rm DT}\) and \(6,000\) samples for the dataset \(\mathcal{D}^{\rm PT}\), with
the remaining data points used for pre-training. Pre-training is done in an end-to-end manner without considering the wake-up radio as in [6].

For comparison, we consider the following benchmarks. For all the schemes using LTT, the grid contains all threshold tuples \((\lambda^{\rm s}, \lambda^{\rm w}, \lambda^{\rm d})\) with \(\lambda^{\rm s}\in \{0,1,\ldots,4\}\), \(\lambda^{\rm w}\in\{0.1, 0.2, \ldots, 0.6\}\), and \(\lambda^{\rm d}\in\{1,3,\ldots,9\}\).

*Conventional neuromorphic wireless communications:*The conventional system is designed without signal detection and wake-up radio modules, which amounts to setting the corresponding thresholds as \(\lambda^{\rm s}=0\) and \(\lambda^{\rm w}=0\). With this conventional setup, the NPUs are continuously on. Furthermore, rather than relying on the proposed adaptive set prediction strategy, in this conventional strategy, the NPU at the receiver side applies top-2 prediction to generate a prediction set, which is constructed by including the top two predicted classes with the highest spike count in the output vector 12 .*LTT:*To evaluate the performance of a basic version of the LTT algorithm, we consider a scheme that implements LTT without the use of digital twinning. This approach follows Algorithm 1, with two caveats: (*i*) the step 1 of pre-selection via a digital twin is not carried out; and (*ii*) the number of on-air calibration transmissions, i.e., the number of iterations of the for cycle in line 4 of Algorithm 1 is limited by the average number of Pareto points in set \(\Lambda\) used by the proposed DT-LTT scheme. This way, the use of spectral resources for calibration is not increased as compared to DT-LTT. Note that this modification violates the assumptions in Theorem 1, and thus this scheme may not satisfy the reliability condition (?? ). This approach uses a fixed test sequence within the mentioned grid of hyperparameters considering first all option with the highest threshold, and then exploring other options decreasing first \(\lambda^{\rm s}\in \{0,1,\ldots,4\}\), then \(\lambda^{\rm w}\in\{0.1, 0.3\}\), and finally \(\lambda^{\rm d}\in\{1,5,9\}\).*DT-LTT with an always-on main radio:*We also consider an*always-on*variant of DT-LTT, which keeps the main receiver radio on for all time instants. In this case, the hyperparameter vector \(\lambda\) to be optimized contains only the threshold \(\lambda^{\rm s}\) for signal detection and the threshold \(\lambda^{\rm d}\) for set prediction. As for LTT, we limit the number of on-air calibration rounds to be at most equal to the number of Pareto points in set \(\Lambda\) of DT-LTT. Furthermore, we set \(\lambda^{\rm w}=0\). Note that, for this strategy, the resulting calibration output does not depend on the parameter \(\gamma\), since the energy consumption at the receiver is constant, irrespective of the selected hyperparameters \(\lambda^{\rm s}\) and \(\lambda^{\rm d}\).

We first consider a scenario in which the digital twin implements an accurate model of the channel so that the simulated channel \(\tilde{\mathbf{h}}\) follows the same distribution \(p(\mathbf{h})\) as the true channel \(\mathbf{h}\). To illustrate the operation of DT-LTT, Fig. 6 (a) presents as black and red dots the expected loss and the energy consumption plus the weighted set size estimated by the digital twin via 19 , 20 and 21 for a given realization of dataset \(\mathcal{D}^{\rm DT}\) and realization of the simulated channels, when the hyperparameters \(\lambda\) are chosen within the mentioned grid of values.

As seen in the figure, the expected loss and energy consumption plus weighted set size are conflicting objectives, since no hyperparameter vector \(\lambda\) exists that yields simultaneously the smallest loss and the smallest energy or the smallest set size. The Pareto optimal points, within the set of chosen options, are depicted as red points, constituting the set \(\Lambda\) of candidates produced by the digital twin. During on-air calibration, the candidates in set \(\Lambda\) are further evaluated in order of the value of the loss estimated at the digital twin.

To elaborate, in Fig. 6 (b), we show weighed sum of energy consumption and set size estimated during on-air calibration using one realization of the dataset \(\mathcal{D}^{\rm PT}\) and channel transmissions for hyperparameters within the set \(\Lambda\). As detailed in Algorithm 5, the on-air calibration estimates the loss, energy and set size using 22 , 23 and 24 , starting from the candidate yielding the smallest value of the loss estimated at the digital twin, and stopping once the loss estimated on the physical system exceeds the threshold \(\psi(\alpha, \delta)\). Here we set \(\alpha=0.2\) and \(\delta=0.05\). The final solution selected by the PT is represented by the star. Note that the PT does not need to evaluate hyperparameters that result in an expected loss larger than the threshold \(\psi(\alpha, \delta)\).

In Fig. 7, we validate the reliability, energy consumption and informativeness of the decisions produced by the calibrated system as a function of the target miscoverage loss \(\alpha\) with \(\delta=0.05\). The ground-truth expected loss \(L(\lambda^*)\), energy consumption \(E(\lambda^*)\) and set size \(I(\lambda^*)\) are obtained by averaging over the test set. In Fig. 7 (a)-(b), the shaded area corresponds to average miscoverage losses that do not satisfy the average constraint 17 . In a manner consistent with Theorem 1, we fix a single realization of dataset \(\mathcal{D}^{\rm DT}\), simulated channels at the digital twin, and real channels, and evaluate the variability of expected loss, energy consumption, and normalized set size with respect to the realization of dataset \(\mathcal{D}^{\rm PT}\). Specifically, each box spans the interquartile range of the corresponding random quantity, with a line indicating the median, while the whiskers extend from the box to show the overall range of the observed values.

From Fig. 7 (a) and Fig. 7 (b), the conventional calibration scheme fails the meet the reliability requirement, while the basic LTT scheme selects conservative hyperparameters for \(\alpha=0.1\) and \(\alpha=0.15\), by including all classes in the predicted set, leading to zero expected loss. In contrast, the proposed DT-LTT schemes are guaranteed to meet the probabilistic reliability requirement 17 as per Theorem 1. Furthermore, as the allowed miscoverage probability \(\alpha\) increases, the expected loss obtained with DT-LTT also grows accordingly.

Looking now at the bottom part of Fig. 7, it is observed that the DT-LTT scheme with an always-on receiver is over-conservative, yielding a large energy consumption, which does not adapt to varying reliability requirements \(\alpha\) (Fig. 7 (c)). This is because this scheme is not given the freedom to keep the main radio of the receiver off in an adaptive manner. In contrast, DT-LTT is able to adjust the energy consumption to the tolerated unreliability level \(\alpha\), reducing the energy consumption accordingly.

The reduction in energy consumption afforded by a larger value of \(\alpha\) depends on the design parameter \(\gamma\), which dictates the relative importance of decreasing the predicted set size. In particular, increasing \(\gamma\) cause the DT-LTT calibration schemes to further reduce the set size as \(\alpha\) increases, as a smaller set can support a larger miscoverage rate \(\alpha\). In this regard, for DT-LTT with \(\gamma=10\), the set size initially decreases and then increases with \(\alpha\). This is due to the importance attributed by calibration to lowering energy consumption, which calls for a larger predicted set to meet the reliability condition. Conversely, with \(\gamma=20\), the set size consistently decreases with \(\alpha\), as the primary objective is to minimize the set size. When \(\alpha\) is set to \(0.1\) and \(0.15\), the set size for the LTT scheme is \(1\), which is omitted in the figure to highlight the trends of other schemes.

In practice, the digital twin may employ simplified or approximated models of the physical system due to computational limitations or modeling errors. In this subsection, we evaluate the impact of a mismatch between the ground-truth physical system and the digital twin model. In this experiment, the true channel has \(N^{\rm P}=4\) paths, with each path amplitude following Rice fading with Rice factor \(K=5\) dB. In contrast, the digital twin model assumes a number of paths \(N^{\rm P}_{\rm DT}\), with each path following Rice fading with Rice factor \(K^{\rm DT}\). Accordingly, the fidelity of the digital twin depends on the choice of the number of paths \(N^{\rm P}_{\rm DT}\), and of the Rice factor \(K^{\rm DT}\). We set \(\alpha=0.2\), \(\delta=0.05\), and \(\gamma=10\).

In Fig. 8, we present the expected loss, energy consumption, and the normalized set size as a function of the number of paths \(N^{\rm P}_{\rm DT}\) in the simulated channel in digital twin. As shown in Fig. 8 (a) and Fig. 8 (b), DT-LTT ensures the reliability condition 17 irrespective of the fidelity of the digital twin, accommodating variations in the number of paths and Rice factor. Furthermore, as illustrated in Fig. 8 (c) and Fig. 8 (d), the performance is not significantly affected by incorrect assumption made by the DT within the considered range of parameters. That said, for instance, when the assumed Rice factor is \(K^{\rm DT}=20\) dB, the system consumes a slightly higher energy due to the mismatch with the real system.

This paper has introduced a novel architecture that integrates wake-up radios into a split neuromorphic computing system. A key challenge in this integration lies in determining thresholds for sensing, WUS detection, and decision-making processes so that the system maintains an expected decision-making loss below a pre-defined target level. To tackle this problem, we have proposed a digital twin-based calibration algorithm that ensures the reliability of the receiver’s decision, while also optimizing a desired trade-off between energy consumption and informativeness of the decision. By leveraging a digital twin of the system, the use of on-air resources for calibration is reduced. Experimental results demonstrated the effectiveness of the proposed algorithm, confirming the theoretical guarantees on reliability, which hold irrespective of the data distribution and of the fidelity of the digital twin.

Future research may explore a hardware-based evaluation of the proposed solution. In terms of algorithm extensions, future work may consider incorporating delay-adaptive decision making by producing an early output once the system is confident in the inference results [30], [34].

The reliability condition ?? is a consequence of the properties of LTT [15], which is leveraged by DT-LTT via the Pareto testing method introduced in [14]. As detailed next, LTT formulates the problem of hyperparameters selection in the framework of multiple-hypothesis testing.

Consider first a single hyperparameter vector \(\lambda\), and define the null hypothesis \[\begin{align} \mathcal{H}(\lambda): L^{\rm PT}(\lambda) > \alpha \end{align}\] that the hyperparameter vector \(\lambda\) does not guarantee the desired reliability level \(\alpha\). Rejecting hypothesis \(\mathcal{H}(\lambda)\) implies that the calibration algorithms deems that the hyperparameter vector \(\lambda\) ensures the reliability condition \(L^{\rm PT}(\lambda) \leq \alpha\) in 17 .

To decide whether to accept or reject the null hypothesis \(\mathcal{H}(\lambda)\), one can evaluate a p-value associated with hypothesis \(\mathcal{H}(\lambda)\), such as \[\begin{align} p(\lambda) =e^{-2|\mathcal{D}^{\rm PT}|(\alpha-\hat{L}^{\rm PT}(\lambda))^2_{+}}. \label{pvalue} \end{align}\tag{28}\] The quantity 28 is indeed a valid p-value for the null hypothesis \(\mathcal{H}(\lambda)\) since the probability \[\begin{align} \Pr[p(\lambda)\leq \delta] \leq \delta \label{pvalid} \end{align}\tag{29}\] holds for \(\delta\in[0,1]\), with the probability \(\Pr[\cdot]\) evaluated with respect to the distribution of dataset \(\mathcal{D}^{\rm PT}\) and the true channels \(\{\mathbf{h}_n \sim p(\mathbf{h})\}_{n=1}^{|\mathcal{D}^{\rm PT}|}\) under the null hypothesis \(\mathcal{H}(\lambda)\). The inequality 29 is verified by Hoeffding’s inequality due to the boundedness of the assumed loss [15].

Plugging 28 into 29 , the inequality 29 is equivalent to the condition \(\Pr[\hat{L}^{\rm PT}(\lambda) \leq \psi(\alpha, \delta)]\leq \delta\) for any fixed hyperparameter \(\lambda\). Therefore, if the inequality \(\hat{L}^{\rm PT}(\lambda) \leq \psi(\alpha, \delta)\) is verified, so is the required reliability condition 17 .

The discussion so far has focused on a single hyperparameter \(\lambda\). However, DT-LTT considers multiple hypotheses \(\mathcal{H}(\lambda)\) corresponding to different candidate hyperparameter vectors \(\lambda\). To this end, DT-LTT follows fixed sequence testing via Pareto testing [14]. Accordingly, the hyperparameter vectors are tested sequentially stopping as soon as the first hyperparameter vector \(\lambda\) is found for which hypothesis \(\mathcal{H}(\lambda)\) is accepted. By [15], this guarantees that all the hyperparameters associated with the rejected hypotheses ensure the reliability condition \(L^{\rm PT}(\lambda) \leq \alpha\) with probability at least \(1-\delta\). Finally, the conservative hyperparameter \(\lambda=[\lambda^{\rm s}=\infty, \lambda^{\rm w}=\infty, \lambda^{\rm d}=\infty]\) also satisfies the reliability condition 17 , since the predicted set \(\mathcal{C}\) always includes the true label, concluding the proof.

[1]

M. Davies *et al.*, “Advancing neuromorphic computing with Loihi: A survey of results and outlook,” *Proceedings of the IEEE*, vol. 109, no. 5, pp.
911–934, May 2021.

[2]

H. Jang, O. Simeone, B. Gardner, and A. Gruning, “An introduction to probabilistic spiking neural networks: Probabilistic models, learning rules, and applications,” *IEEE
Signal Processing Magazine*, vol. 36, no. 6, pp. 64–77, 2019.

[3]

Y. Matsubara and M. Levorato, “Split computing for complex object detectors: Challenges and preliminary results,” in *Proc. International Workshop on Embedded and Mobile
Deep Learning*, pp. 7–12, 2020.

[4]

Y. Matsubara *et al.*, “Head network distillation: Splitting distilled deep neural networks for resource-constrained edge computing systems,” *IEEE Access*,
vol. 8, pp. 212 177–212 193, Nov. 2020.

[5]

N. Skatchkovsky, H. Jang, and O. Simeone, “End-to-end learning of neuromorphic wireless systems for low-power edge artificial intelligence,” in *Proc. Asilomar Conference
on Signals, Systems, and Computers*, pp. 166–173, 2020.

[6]

J. Chen, N. Skatchkovsky, and O. Simeone, “Neuromorphic wireless cognition: Event-driven semantic communications for remote inference,” *IEEE Transactions on Cognitive
Communications and Networking*, vol. 9, no. 2, pp. 252–265, Apr. 2023.

[7]

——, “Neuromorphic integrated sensing and communications,” *IEEE Wireless Communications Letters*, vol. 12, no. 3, pp. 476–480, Mar. 2023.

[8]

R. Piyare *et al.*, “Ultra low power wake-up radios: A hardware and networking survey,” *IEEE Communications Surveys & Tutorials*, vol. 19, no. 4, pp.
2117–2157, Jul. 2017.

[9]

Z. Jouni *et al.*, “1.2 nw neuromorphic enhanced wake-up radio,” in *Proc. SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)*,
pp. 1–6, 2022.

[10]

M. Z. Win and R. A. Scholtz, “Impulse radio: How it works,” *IEEE Communications Letters*, vol. 2, no. 2, pp. 36–38, Feb. 1998.

[11]

A. Hoglund *et al.*, “3GPP release 18 wake-up receiver: Feature overview and evaluations,” arXiv preprint arXiv:2401.03333, 2024.

[12]

H. Yomo *et al.*, “Wake-up ID and protocol design for radio-on-demand wireless LAN,” in *Proc. IEEE International Symposium on Personal,
Indoor and Mobile Radio Communications (PIMRC)*, pp. 419–424, 2012.

[13]

J. Shiraishi *et al.*, “Content-based wake-up for top-k query in wireless sensor networks,” *IEEE Transactions on Green
Communications and Networking*, vol. 5, no. 1, pp. 362–377, 2020.

[14]

Laufer-Goldshtein *et al.*, “Efficiently controlling multiple risks with pareto testing,” arXiv preprint arXiv:2210.07913, 2022.

[15]

A. N. Angelopoulos *et al.*, “Learn then test: Calibrating predictive algorithms to achieve risk control,” arXiv preprint arXiv:2110.01052, 2021.

[16]

F. Bozorgi, P. Sen, A. N. Barreto, and G. Fettweis, “RF front-end challenges for joint communication and radar sensing,” in *Proc. IEEE International Online
Symposium on Joint Communications & Sensing (JC&S)*, pp. 1–6, 2021.

[17]

K. Dakic, B. Al Homssi, S. Walia, and A. Al-Hourani, “Spiking neural networks for detecting satellite internet-of-things signals,” *IEEE Transactions on Aerospace and
Electronic Systems*, vol. 60, no. 1, pp. 1224–1238, Nov. 2023.

[18]

J. Lee *et al.*, “An asynchronous wireless network for capturing event-driven data from large populations of autonomous sensors,” *Nature Electronics*, pp.
1–12, Mar. 2024.

[19]

T. Borsos *et al.*, “Resilience analysis of distributed wireless spiking neural networks,” in *Proc. IEEE Wireless Communications and Networking Conference
(WCNC)*, pp. 2375–2380, 2022.

[20]

Y. Liu, Z. Qin, and G. Y. Li, “Energy-efficient distributed spiking neural network for wireless edge intelligence,” *IEEE Transactions on Wireless Communications (Early
Access)*, Mar. 2024.

[21]

A. Pegatoquet, T. N. Le, and M. Magno, “A wake-up radio-based MAC protocol for autonomous wireless sensor networks,” *IEEE/ACM Transactions on
Networking*, vol. 27, no. 1, pp. 56–70, Nov. 2018.

[22]

S. Tang and S. Obana, “Tight integration of wake-up radio in wireless LANs and the impact of wake-up latency,” in *Proc. IEEE Global Communications
Conference (GLOBECOM)*, pp. 1–6, 2016.

[23]

L. U. Khan *et al.*, “Digital-twin-enabled 6G: Vision, architectural trends, and future directions,” *IEEE Communications Magazine*, vol. 60,
no. 1, pp. 74–80, Jan. 2022.

[24]

C. Ruah, O. Simeone, and B. Al-Hashimi, “A Bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems,” *IEEE
Journal on Selected Areas in Communications*, vol. 41, no. 10, pp. 3146–3160, Aug. 2023.

[25]

S. Jiang and A. Alkhateeb, “Digital twin based beam prediction: Can we train in the digital world and deploy in reality?” in *Proc. IEEE International Conference on
Communications Workshops (ICC Workshops)*, pp. 36–41, 2023.

[26]

J. Morais and A. Alkhateeb, “Localization in digital twin MIMO networks: A case for massive fingerprinting,” arXiv preprint arXiv:2403.09614, 2024.

[27]

G. Shafer and V. Vovk, “A tutorial on conformal prediction.” *Journal of Machine Learning Research*, vol. 9, no. 3, pp. 371–421, Aug. 2008.

[28]

K. M. Cohen, S. Park, O. Simeone, and S. Shamai, “Calibrating AI models for wireless communications via conformal prediction,” *IEEE Transactions on Machine
Learning in Communications and Networking*, vol. 1, pp. 296–312, Sept. 2023.

[29]

D. Seo, Lim, and S. Hoon, “On the fundamental tradeoff of joint communication and quickest change detection,” arXiv preprint arXiv:2401.12499, 2023.

[30]

J. Chen, S. Park, and O. Simeone, “SpikeCP: Delay-adaptive reliable spiking neural networks via conformal prediction,” arXiv preprint arXiv:2305.11322,
2023.

[31]

V. Vovk, A. Gammerman, and G. Shafer, *Algorithmic Learning in a Random World*.1em plus 0.5em minus 0.4emSpringer Nature, 2022.

[32]

O. Simeone, *Machine learning for engineers*.1em plus 0.5em minus 0.4emCambridge University Press, 2022.

[33]

M. Zecchin, S. Park, O. Simeone, and F. Hellström, “Generalization and informativeness of conformal prediction,” arXiv preprint arXiv:2401.11810, 2024.

[34]

J. Chen, S. Park, and O. Simeone, “Agreeing to stop: Reliable latency-adaptive decision making via ensembles of spiking neural networks,” *Entropy*, vol. 26, no. 2,
p. 126, Jan. 2024.

J. Chen, S. Park, and O. Simeone are with the King’s Communications, Learning and Information Processing (KCLIP) lab within the Centre for Intelligent Information Processing Systems (CIIPS) at the Department of Engineering, King’s College London, London, WC2R 2LS, UK (email:{jiechen.chen, sangwoo.park, osvaldo.simeone}

**kcl.ac.uk?**). O. Simeone is also with the Department of Electronic Systems, Aalborg University, 9100 Aalborg, Denmark. P. Popovski is with the Department of Electronic Systems, Aalborg University, 9100 Aalborg, Denmark (email: petarp@es.aau.dk). H. Vincent Poor is with the Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail:poor@princeton.edu).

This work was supported by the European Union’s Horizon Europe project CENTRIC (101096379), by an Open Fellowship of the EPSRC (EP/W024101/1), by the EPSRC project (EP/X011852/1), by Project REASON, a UK Government funded project under the Future Open Networks Research Challenge (FONRC) sponsored by the Department of Science Innovation and Technology (DSIT), and by the U.S. National Science Foundation through award 2335876.↩︎