Adversary-Aware Private Inference over Wireless Channels


Abstract

AI-based sensing at wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications, particularly for vision and perception tasks such as in autonomous driving and environmental monitoring. AI systems rely both on efficient model learning and inference. In the inference phase, features extracted from sensing data are utilized for prediction tasks (e.g., classification or regression). In edge networks, sensors and model servers are often not co-located, which requires communication of features. As sensitive personal data can be reconstructed by an adversary, transformation of the features are required to reduce the risk of privacy violations. While differential privacy mechanisms provide a means of protecting finite datasets, protection of individual features has not been addressed. In this paper, we propose a novel framework for privacy-preserving AI-based sensing, where devices apply transformations of extracted features before transmission to a model server. Our framework provides rigorous theoretical guarantees on the adversarial reconstruction error, which in turn offers key insights for the design of wireless channel–aware privacy mechanisms.

Inference, Wireless Channel, Differential Privacy, Reconstruction Attack, Avdversarial Recontruction Error.

1 Introduction↩︎

AI is expected to be a key enabler of new applications in next-generation networks [1][3]. Combined with advanced sensing technologies, AI facilitates low-latency inference and enhance sensing applications ranging from autonomous driving to personal identification and environmental monitoring [4], [4]. However, sensing devices often cannot support on-device inference with high-accuracy models. On the other hand, on-server inference requires communication of raw data from sensors to an edge server, which introduces significant overheads in low-latency applications.

A compelling alternative is collaborative inference [5][7], where sensing devices locally extract and communicate features to a server. In this approach, inference involves four stages: data acquisition via sensing; local feature extraction; feature encoding and transmission to a model server; and prediction at the server. As features are communicated, there is a risk of leakage of fine-grained information which must be accounted for in the design of feature encoding and transmission.

Figure 1: Illustration of the private task-inference framework: A single edge device extracts and transmits privatized features over a wireless channel to a central inference server for classification, while an adversary observes the transmission to reconstruct sensitive information.

A standard approach to ensuring privacy of datasets is differential privacy, which guarantees that the probability an output of a privacy-preserving mechanism is observed does not significantly differ when an element of the dataset is removed [8]. Recently, feature differential privacy has been introduced in [7] for multi-user transmission of features over wireless channels. Analogous to differential privacy, the notion of feature differential privacy in [7] provides guarantees on the identifiability of a feature transmitted by any of the devices.

In this paper, we consider a single device which communicates encoded features to a model server over a wireless channel. At the same time, an adversary attempts to infer the communicated feature via observations over another wireless channel. The key question that we address is: how should the device encode features with a low communication overhead in order to guarantee a high reconstruction error by the eavesdropper?

To address this question, we introduce an end-to-end transmission pipeline for collaborative inference that jointly reduces communication overhead and ensures privacy guarantees through dimensionality reduction, controlled perturbation, and adaptive feature encoding. In contrast to privacy-agnostic communication methods that primarily emphasize reliable data delivery [9], [10], our approach directly integrates privacy into the transmission process by introducing calibrated randomness to mitigate reconstruction risks. Each stage of the pipeline is optimized to balance inference performance with guarantees on adversarial reconstruction error, while explicitly accounting for the impact of the wireless channel, which itself provides inherent privacy benefits. Importantly, we provide theoretical guarantees on adversarial reconstruction error, offering rigorous insights into how channel-aware feature encoding and privacy-preserving perturbation interact to safeguard sensitive information. This results in a comprehensive and efficient transmission framework for wireless AI-based sensing and inference, aligned with recent work tailored to standard privacy criteria [11][18].

Our Contributions. In this paper, we introduce an end-to-end pipeline for collaborative inference over wireless taking into account data acquisition and channel impairments, beyond differential privacy constraints. In particular, we introduce a novel privacy framework refining [7], known as feature differential privacy, designed to protect intermediate feature representations extracted at edge devices during transmission. Our main contributions are summarized as follows:

  • We introduce the notion of feature privacy and establish novel theoretical guarantees against reconstruction attacks by deriving a lower bound on the adversary’s mean squared error (MSE) as a function of model parameters, channel conditions, transmit power, and noise.

  • We also derive a lower bound on the classification accuracy under differential privacy constraints, capturing the trade-offs between feature dimensions, privacy noise, and wireless channel conditions.

Paper Organization. The remainder of the paper is organized as follows. Section 2 introduces the system model and describes the proposed transmission scheme, and analyzing its classification performance in Section 3. Section 4 analyzes the adversarial MSE under the proposed scheme. Finally, Section 7 concludes the paper. Due to space limitations, all proofs are omitted.

2 System Model↩︎

We consider an end-to-end transmission pipeline (Fig. 1) in which a single-antenna edge device communicates its extracted features to a remote server over a wireless fading channel. Specifically, the device employs a pre-trained sub-model to process a captured signal (e.g., an image) and generate a real-valued feature vector. This representation is then transmitted to the server, where a pre-trained server-side model completes the inference by performing a classification task for a raw input \(\mathbf{x}\) associated with a ground truth label \(l^{*}\). We detail the inference procedure next.

Step 1: Feature Extraction and Dimension Reduction. The device first extracts features from the raw input \(\mathbf{x}\), resulting in a vector representation \(\mathcal{F}(\mathbf{x}) \in \mathbb{R}^d\). The feature vector \(\mathcal{F}(\mathbf{x})\) is then clipped via \[\begin{align} \mathbf{f} = \min \left(1, \frac{C_{f}}{\| \mathcal{F}(\mathbf{x}) \|_{2}}\right) \cdot \mathcal{F}(\mathbf{x}). \end{align}\] Dimension reduction is then applied using a linear encoder: \[\begin{align} \mathbf{z} = \mathbf{W} \, \mathbf{f}, \end{align}\] where \(\mathbf{W} \in \mathbb{R}^{r \times d}\) is the encoder matrix (the structure of the matrix \(\mathbf{W}\) will be described later), and \(r \leq d\) denotes the reduced feature dimension.

Step 2: Local Noise Injection for Privacy. To ensure local privacy, the device perturbs the encoded vector via the Gaussian mechanism [8] by adding Gaussian noise: \[\begin{align} \tilde{\mathbf{z}} = \mathbf{z} + \mathbf{n}, \quad \mathbf{n} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_r), \end{align}\] where \(\sigma^2\) is the privacy noise variance.

Step 3: Feature Transmission. The noisy feature vector \(\tilde{\mathbf{z}}\) is scaled by a scaling coefficient \(\alpha > 0\) and transmitted over the wireless fading channel: \[\begin{align} \mathbf{z}' = \alpha \tilde{\mathbf{z}}, \quad \alpha = \sqrt{P}, \label{eqn:transmitted95signal} \end{align}\tag{1}\] where \(P\) is the transmit power.

Step 4: Signal Reception at the Server. The edge server receives the following signal: \[\begin{align} \mathbf{y} = h \mathbf{z}' = h \alpha \mathbf{z} + h \alpha \mathbf{n} + \mathbf{m}, \end{align}\] where \(h\) is a block-fading channel and \(\mathbf{m} \sim \mathcal{N}(\mathbf{0}, \sigma_m^2 \mathbf{I}_r)\) represents the Gaussian channel noise. We note that this model arises as a baseband representation of analog narrowband transmission (e.g., modulation of a real-valued signal via frequency-division multiplexing).

Step 5: Post-Processing. The server rescales the received signal using a general scaling factor \(\beta > 0\), yielding an estimate of the perturbed feature: \[\begin{align} \hat{\mathbf{z}} = \beta \mathbf{y} = \beta h \alpha \mathbf{z} + \beta h \alpha \mathbf{n} + \beta \mathbf{m}. \end{align}\]

Step 6: Decoding. Finally, the server reconstructs the original feature representation using a linear decoder \(\mathbf{D} \in \mathbb{R}^{d \times r}\): \[\begin{align} \hat{\mathbf{f}} = \mathbf{D} \hat{\mathbf{z}} = \beta h \alpha \mathbf{D} \mathbf{z} + \beta h \alpha \mathbf{D} \mathbf{n} + \beta \mathbf{D} \mathbf{m}. \end{align}\]

3 Server Performance↩︎

3.1 Classifier Accuracy↩︎

A key question for any private inference scheme is the classification accuracy. In order to analyze the accuracy, we impose a standard assumption on the classifier at the server. In particular, the deployed server model has an intrinsic classification margin \(\Delta\) [19], formally defined as follows.

Definition 1 (Classification Margin [19]). The classification margin of a target \(\mathbf{x}\) represented by \(( \mathbf{f}^{*}, {l}^{*})\) measured by Euclidean distance is defined as follows: \[\begin{align} \Delta \triangleq \sup \{B: \|{\mathbf{f}} - \mathbf{f}^{*}\|_{2} \leq B\operatorname{s.t.}\hat{l}(\mathbf{f}) = l^{*}, \forall \mathbf{f} \}. \end{align}\]

We next establish a lower bound for the classification accuracy of our proposed scheme.

Lemma 1 (Classification Accuracy). The lower bound on the classification accuracy for our proposed privacy-preserving method can be expressed as \[\begin{align} P(\hat{l} = l^{*}) \geq \max \left\{0, P_{0} \cdot \left(1 - \frac{\operatorname{MSE}}{\Delta^{2}}\right)\right\}, \end{align}\] where \(P_{0}\) represents the classification accuracy in the ideal case (i.e., no communication errors and privacy constraints), and \(\Delta\) represents the inherent classification margin.

3.2 Upper bound on the MSE at the Server↩︎

We analyze the degradation in inference accuracy induced by local privacy noise and wireless channel impairments. Specifically, we quantify the MSE between the original feature vector \(\mathbf{f} \in \mathbb{R}^d\) extracted at the edge device and the reconstructed estimate \(\hat{\mathbf{f}} \in \mathbb{R}^d\) at the server. The MSE at the server is given by: \[\operatorname{MSE} \triangleq \mathbb{E} \left[ \| \hat{\mathbf{f}} - \mathbf{f} \|_2^2 \right],\] which can be upper bounded as: \[\begin{align} \operatorname{MSE} &\leq \left\| \beta h \alpha \mathbf{D} \mathbf{W} - \mathbf{I}_d \right\|_F^2 \cdot \| \mathbf{f} \|_2^2 \nonumber \\ &\quad + \beta^2 h^2 \alpha^2 \sigma^2 \cdot \| \mathbf{D} \|_F^2 + \beta^2 \sigma_m^2 \cdot \| \mathbf{D} \|_F^2. \end{align}\]

This expression consists of three key components:

  1. approximation error due to the deviation of \(\beta h \alpha \mathbf{D} \mathbf{W}\) from the identity matrix, which reflects the encoder-decoder mismatch. As \(\mathbf{W}\) has linearly independent columns with probability one, choosing \(\mathbf{D}\) as the Moore-Penrose pseudoinverse allows reconstruction of \(\mathbf{z}\) in the absence of noise. As a consequence if \(\beta = \frac{1}{\alpha h}\), the approximation error can be zero. However, this requires perfect estimation of \(h\).

  2. privacy-induced distortion that is arising from the locally injected Gaussian noise \(\mathbf{n}\), scaled by transmission and channel gain.

  3. channel-induced distortion resulting from the additive Gaussian channel noise \(\mathbf{m}\). In addition, the expectation is taken over both sources of noise, while the feature vector \(\mathbf{f}\) is assumed deterministic after clipping, satisfying \(\|\mathbf{f}\|_2 \leq C_f\).

a

Figure 2: Impact of the dimensionality reduction on the classification accuracy for the same privacy leakage \(\epsilon_{\max}\), where \(r = q \times 7 \times 7\), based on the ModelNet Dataset [20], where \(P = 30\) dBm, \(\sigma^{2} = 0.1\), \(\delta = 10^{-5}\), and \(C_{f} = 10^{2}\)..

In high-dimensional feature spaces, the performance of the model can degrade due to the increasing impact of perturbation noise. Specifically, in the high privacy regime, the noise introduced for privacy protection scales with the dimensionality of the data, causing the MSE to increase. The MSE behaves asymptotically as: \(\operatorname{MSE} \sim O(d)\), where \(d\) represents the dimensionality of the feature space [7]. The classification accuracy bound then gives: \[P(\hat{l} = l^{*}) \geq P_0 \left(1 - \frac{O(d)}{\Delta^2}\right).\] As \(d \to \infty\), the classification accuracy degrades, approaching zero unless the classification margin \(\Delta\) grows to counterbalance the increasing MSE. Therefore, while higher dimensions provide more expressive power for feature representation, they also amplify the effects of perturbation noise, leading to a reduction in accuracy. We show this impact in Fig. 2, which demonstrates how classification accuracy decreases as dimensionality increases, driven by the amplification of perturbation noise. However, in the low privacy regime, the relationship between dimensionality and accuracy is different [7]. In this regime, the impact of noise is not dominant. This effect is illustrated in Fig. 2, where the accuracy for the case when \(q = 16\) starts to degrade at higher values of \(q\) for \(\epsilon \geq 10\), indicating a critical turning point where additional dimensionality begins to aid rather than impede classification accuracy.

4 Reconstruction Attack Analysis↩︎

In this section, we analyze the privacy leakage from an adversary attempting to reconstruct the encoded feature vector \(\mathbf{z}\) by eavesdropping on the wireless transmission over another channel.

Adversarial Observation. The adversary observes \[\begin{align} \mathbf{y}_{\mathrm{adv}} = g \alpha \mathbf{z} + g \alpha \mathbf{n} + \mathbf{m}_{\mathrm{adv}}, \label{eqn:adversarial95observation} \end{align}\tag{2}\] where the adverary’s channel is block fading with channel coefficient \(g \in \mathbb{R}\), and \(\mathbf{m}_{\mathrm{adv}} \sim \mathcal{N}(\mathbf{0}, \sigma_a^2 \mathbf{I}_r)\) denotes the additive Gaussian noise at the adversary’s receiver. We define the total effective noise variance of the adversary as \[\nu^2 = g^2 \alpha^2 \sigma^2 + \sigma_a^2.\]

Generic Linear Estimator. The adversary applies a re-scaling factor \(\gamma > 0\) to form an estimate of the latent representation: \[\begin{align} \hat{\mathbf{z}} = \gamma \mathbf{y}_{\mathrm{adv}} = \gamma g \alpha \mathbf{z} + \gamma g \alpha \mathbf{n} + \gamma \mathbf{m}_{\mathrm{adv}}. \end{align}\] Unless \(\gamma = \frac{1}{g \alpha}\), the estimator is biased. As the adversary will also have channel estimation errors, we retain \(\gamma\) as a tunable parameter dependent on the estimation error to explore trade-offs between bias and variance.

Reconstruction Error. The MSE reconstruction error at the adversary is given as \[\begin{align} & \mathrm{MSE}_{\mathrm{adv}} = \mathbb{E} \left[ \| \hat{\mathbf{z}} - \mathbf{z} \|^2 \right] \nonumber \\ & = (\gamma g \alpha - 1)^{2} \left\| \mathbf{z} \right\|^2 + \gamma^2 g^2 \alpha^2 \mathbb{E}[\| \mathbf{n} \|^2] + \gamma^2 \mathbb{E}[\| \mathbf{m}_{\mathrm{adv}} \|^2]. \nonumber \end{align}\] We next provide a formal privacy definition for protecting the transmitted signals, followed by a detailed analysis of the resulting adversarial error.

Definition 2 (\((\epsilon, \delta)\)-feature DP). Let \(\mathbf{x}\) be a raw input and \(\mathcal{F}\) be an associated space of features and \(C_f > 0\). A randomized mechanism \(\mathcal{M}: \mathcal{F} \rightarrow \mathbb{R}^{d}\) is \((\epsilon, \delta)\)-feature DP if for any two features \(\mathbf{f}, \mathbf{f}' \in \mathcal{F}\) satisfying \(\|\mathbf{f} - \mathbf{f}'\| \leq 2C_f\), and any measurable subset \(\mathcal{S} \subseteq \text{Range}(\mathcal{M})\), we have \[\begin{align} \operatorname{Pr}(\mathcal{M}(\mathbf{f}) \in \mathcal{S}) \leq e^{\epsilon} \operatorname{Pr}(\mathcal{M}(\mathbf{f}') \in \mathcal{S}) + \delta. \end{align}\] The setting when \(\delta = 0\) is referred as pure \(\epsilon\)-feature DP.

We next present a lower bound on the adversarial MSE for the case when the elements of the compression matrix \(\mathbf{W}\) are drawn from Laplacian distribution.

Theorem 1 (Adversarial MSE Lower Bound under DP). Let \(\mathbf{f} \in \mathbb{R}^d\) be a clipped feature satisfying \(\| \mathbf{f} \|_2 \le C_f\). Let \(\mathbf{z} = \mathbf{W} \mathbf{f} \in \mathbb{R}^r\), where \(\mathbf{W} \in \mathbb{R}^{r \times d}\) is a random matrix with i.i.d. entries drawn from the Laplace distribution \(\mathrm{Lap}(0, b)\). Then, with probability at least \(1 - \delta\) over the draw of \(\mathbf{W}\), the mechanism \[\mathcal{M}(\mathbf{f}) = \mathbf{W} \mathbf{f} + \mathbf{n}\] satisfies \((\epsilon, 2 \delta)\)-feature DP with respect to the input feature vector \(\mathbf{f}\), provided \[\sigma^2 = \frac{8 C_w^2 b^2 C_f^2 (r + d) \log(1.25/\delta)}{\epsilon^2},\] where \(C_w = 4 \left( 1 + \frac{\log(2/\delta)}{\sqrt{r} + \sqrt{d}} \right)\) is a high-probability upper bound on the spectral norm \(\| \mathbf{W} \|_2\). Moreover, \[\mathrm{MSE}_{\mathrm{adv}} \geq \frac{r \nu^2 D^2}{g^2 \alpha^2 D^2 + r \nu^2}, \quad D = C_{f} C_{w} b (\sqrt{r} + \sqrt{d}).\]

We visualize the impact of the transmitted signal dimension \(r\) on the adversarial MSE in Fig. 3. While a smaller \(r\) typically reduces communication latency, it also limits the adversary’s reconstruction error, potentially reducing privacy leakage.

Proof Sketch. To ensure \((\epsilon, 2 \delta)\)-feature DP as presented in the above Theorem, we use the Gaussian mechanism [8], which requires the noise variance to scale with the squared global \(\ell_2\)-sensitivity of the mechanism. For neighboring features with \(\| \mathbf{f} - \mathbf{f}' \|_2 \le 2 C_f\), the sensitivity of the linear mapping \(\mathbf{W} \mathbf{f}\) is \[\Delta_2 = \| \mathbf{W} (\mathbf{f} - \mathbf{f}') \|_2 \le \| \mathbf{W} \|_2 \cdot \| \mathbf{f} - \mathbf{f}' \|_2 \le 2 C_f \cdot \| \mathbf{W} \|_2.\] With high probability \(1 - \delta\), the spectral norm of \(\mathbf{W}\) is bounded by \(C_w b (\sqrt{r} + \sqrt{d})\) [21]. Hence, \[\Delta_2 \le 2 C_f C_w b (\sqrt{r} + \sqrt{d}).\] By the Gaussian mechanism [8], it suffices to use \[\sigma^2 \ge \frac{2 \Delta_2^2 \log(1.25/\delta)}{\epsilon^2} = \frac{8 C_w^2 b^2 C_f^2 (\sqrt{r} + \sqrt{d})^2 \log(1.25/\delta)}{\epsilon^2}.\] Using \((\sqrt{r} + \sqrt{d})^2 \le 2(r + d)\), we further simplify: \[\sigma^2 = \frac{8 C_w^2 b^2 C_f^2 (r + d) \log(1.25/\delta)}{\epsilon^2}.\] Now consider the adversarial observation: \[\mathbf{y}_{\mathrm{adv}} = g \alpha \mathbf{z} + g \alpha \mathbf{n} + \mathbf{m}_{\mathrm{adv}} = g \alpha \mathbf{z} + \mathbf{w},\] where \(\mathbf{w} \sim \mathcal{N}(0, \nu^2 \mathbf{I}_r)\) and \(\nu^2 = g^2 \alpha^2 \sigma^2 + \sigma_a^2\).

We consider a general linear estimator \(\hat{\mathbf{z}} = \gamma \mathbf{y}_{\mathrm{adv}}\). The minimax MSE over all \(\mathbf{z}\) satisfying \(\|\mathbf{z}\|_2^2 \le D^2\) is \[\mathrm{MSE}_{\mathrm{adv}} = \inf_{\gamma} \sup_{\|\mathbf{z}\|_2 \le D} \left[ (\gamma g \alpha - 1)^2 \|\mathbf{z}\|_2^{2} + \gamma^2 r \nu^2 \right].\] Solving for the optimal \(\gamma^* = \frac{g \alpha D^2}{g^2 \alpha^2 D^2 + r \nu^2}\) and substituting back, we obtain \[\mathrm{MSE}_{\mathrm{adv}} \geq \frac{r \nu^2 D^2}{g^2 \alpha^2 D^2 + r \nu^2}.\]

Remark 1. We adopt Laplacian-distributed encoder weights instead of the more conventional Gaussian assumption for two main reasons. First, empirical studies have shown that weight distributions in trained neural networks often exhibit heavy-tailed, Laplacian-like behavior rather than Gaussian, especially in early or sparsely regularized layers [22], [23]. Second, Laplacian distributions yield sharper concentration bounds on the spectral norm [24], which is crucial in our theoretical analysis for bounding privacy leakage and adversarial reconstruction error.

We next determine the optimal projection dimension \(r\) required to ensure a target adversarial MSE threshold \(\Omega\), while minimizing latency. The result is given below.

Lemma 2 (Optimal Projection Dimension under Adversarial MSE Constraint). Then, the minimal encoding dimension \(r^* \in \mathbb{N}\) that minimizes latency and satisfying the constraint \(\mathrm{MSE}_{\mathrm{adv}} \geq \Omega\), it suffices to choose \(r\) as \[\begin{align} r^{*} = \left\lceil \frac{g^2 \alpha^2 D^2 \cdot \Omega}{\nu^2 \left(D^2 - \Omega\right)} \right \rceil. \end{align}\]

After simplifications (up to constants), we have \[r^{\star} =\Theta\!\left( \frac{ -\bigl(\tfrac{d}{\epsilon^{2}}+\sigma_{a}^{2}\bigr) +\sqrt{\bigl(\tfrac{d}{\epsilon^{2}}+\sigma_{a}^{2}\bigr)^{2} +\tfrac{1}{\epsilon^{2}}} }{ \tfrac{1}{\epsilon^{2}} } \right),\] which can be further simplified order-wise in two main regimes: \[r^{\star} = \begin{cases} \Theta\!\bigl(\epsilon^{2}/d\bigr), & \text{if } \tfrac{d}{\epsilon^{2}} \gg \sigma_{a}^{2} \quad (\text{privacy-limited}),\\[4pt] \Theta(1), & \text{if } \sigma_{a}^{2} \gg \tfrac{d}{\epsilon^{2}} \quad (\text{noise-limited}).\\ \end{cases}\] To summarize, stronger privacy (smaller \(\epsilon\)) or larger \(d\) reduce the required projection dimension roughly as \(r^{\star}\!\propto\!\epsilon^{2}/d\), while in the noise-limited regime \(r^{\star}\) saturates to a constant set by the channel noise \(\sigma_{a}^{2}\).

Figure 3: Adversarial MSE lower bound as a function of the privacy parameter \epsilon. Parameters used: C_f = 2, b = 0.01, d = 50, \delta = 10^{-5}, \alpha = 1, g = 1, and \sigma_a^2 = 1. The MSE bound increases with stronger privacy (smaller \epsilon), reflecting the privacy-utility trade-off.

Theorem 2 (Lower Bound on Adversarial MSE for the raw data \(\mathbf{x}\)). Let \(\hat{\mathbf{z}}\) be the adversary’s MMSE estimator of \(\mathbf{z}\). Assume the adversary attempts to recover the feature representation \(\mathbf{f}(\mathbf{x}) \in \mathbb{R}^d\) using the pseudoinverse; i.e., the best linear unbiased estimator: \[\hat{\mathbf{f}}(\mathbf{x}) = \mathbf{W}^\dagger \hat{\mathbf{z}}, \quad \text{where } \mathbf{W}^\dagger = (\mathbf{W}^\top \mathbf{W})^{-1} \mathbf{W}^\top.\]

Then the error in reconstructing \(\mathbf{f}(\mathbf{x})\) satisfies \[\hat{\mathbf{f}}(\mathbf{x}) - \mathbf{f}(\mathbf{x}) = \mathbf{W}^\dagger ( \hat{\mathbf{z}} - \mathbf{z} ),\] and the MSE satisfies \[\begin{align} \mathbb{E} \left[ \| \hat{\mathbf{f}}(\mathbf{x}) - \mathbf{f}(\mathbf{x}) \|_2^2 \right] & \le \| \mathbf{W}^\dagger \|_{\mathrm{op}}^2 \cdot \mathbb{E} \left[ \| \hat{\mathbf{z}} - \mathbf{z} \|_2^2 \right] \nonumber \\ & \le \| \mathbf{W}^\dagger \|_{\mathrm{op}}^2 \cdot \mathrm{MSE}_{\mathrm{adv}}. \end{align}\]

Now suppose \(\mathbf{W}\) is a random matrix with i.i.d. entries drawn from Laplacian distribution. Then with high probability over the draw of \(\mathbf{W}\), \[\| \mathbf{W}^\dagger \|_{\mathrm{op}} \le \frac{1}{c ( \sqrt{d} - \sqrt{r} - t )}, \quad \text{for any } t > 0,\] with probability at least \(1 - 2 e^{-c t^2}\). Thus \[\mathbb{E} \left[ \| \hat{\mathbf{f}}(\mathbf{x}) - \mathbf{f}(\mathbf{x}) \|_2^2 \right] \ge \frac{1}{c^2 ( \sqrt{d} - \sqrt{r} - t )^2} \cdot \mathrm{MSE}_{\mathrm{adv}}.\]

Remark 2 (Pseudoinverse vs. Neural Network Estimators). While a neural network could be used by the adversary to estimate \(\mathbf{f}(\mathbf{x})\) from \(\hat{\mathbf{z}}\), the pseudoinverse remains optimal in the worst-case setting without a prior on \(\mathbf{f}(\mathbf{x})\). It minimizes the reconstruction error among all linear estimators, and no nonlinear method can fundamentally outperform it unless additional structure or data is available. Thus, the lower bound holds for any adversarial estimator, including deep networks.

5 A Wireless Data Acquisition Model↩︎

We next present a concrete example that incorporates a realistic data acquisition model based on noisy subsampled transform measurements. Such acquisition schemes arise in practical systems including compressive imaging, wireless spectrum sensing, and embedded hardware, where only a subset of transform-domain coefficients can be measured due to sensing, power, or bandwidth constraints. To capture this structure, we model the feature extractor as a subsampled orthogonal transform with additive measurement noise and apply clipping to ensure bounded sensitivity. This setup allows us to instantiate the general reconstruction bounds in a setting that reflects real-world constraints while preserving analytical tractability.

Example 1 (Subsampled Orthogonal Feature Acquisition). Let the feature extractor be defined as \[\mathbf{f}(\mathbf{x}) = \mathbf{A} \mathbf{x} + \mathbf{w}, \quad \text{where } \mathbf{A} = \mathbf{P}_d \mathbf{T}_m \in \mathbb{R}^{d \times m},\] where \(\mathbf{T}_m \in \mathbb{R}^{m \times m}\) a unitary transform (e.g., DFT), so that \(\mathbf{T}_m^\top \mathbf{T}_m = \mathbf{I}_m\), \(\mathbf{P}_d \in \{0,1\}^{d \times m}\) a subsampling operator selecting \(d\) rows uniformly without replacement, and \(\mathbf{w} \sim \mathcal{N}(0, \sigma_w^2 \mathbf{I}_d)\) is an additive measurement noise. The inverse map used by the adversary is \[\hat{\mathbf{x}} = \mathbf{T}_m^\top \mathbf{P}_d^\top \hat{\mathbf{f}}(\mathbf{x}),\] where \(\mathbf{P}_d^\top\) zero-fills the unobserved transform coordinates. Since both \(\mathbf{P}_d^\top\) and \(\mathbf{T}_m^\top\) are norm non-expanding, the inverse map is 1-Lipschitz: \[\| \hat{\mathbf{x}} - \mathbf{x} \|_2 \ge \| \hat{\mathbf{f}}(\mathbf{x}) - \mathbf{f}(\mathbf{x}) \|_2.\]

Substituting the adversarial feature reconstruction bound from the previous theorem, we obtain \[\mathbb{E} \left[ \| \hat{\mathbf{x}} - \mathbf{x} \|_2^2 \right] \ge \frac{1}{c^2 ( \sqrt{d} - \sqrt{r} - t )^2} \cdot \mathrm{MSE}_{\mathrm{adv}},\] with high probability over the draw of the encoder \(\mathbf{W}\), and where \(\mathrm{MSE}_{\mathrm{adv}} = \mathbb{E}[\| \hat{\mathbf{z}} - \mathbf{z} \|_2^2]\) is the adversary’s MMSE on the encoded representation \(\mathbf{z}\).

6 Privacy Amplification via Massive MIMO.↩︎

To further strengthen privacy at the physical layer, we extend the communication model to a massive MIMO setting where the transmitter (i.e., the client device) is equipped with a large antenna array and a single antenna receiver (i.e., the server). In this regime, the uplink channel to the server exhibits channel hardening and favorable propagation [25], concentrating tightly around its mean, whereas the adversary’s channel remains weak and statistically uncorrelated owing to favorable propagation conditions. This asymmetry enables the legitimate receiver to coherently decode the transmitted features while inherently suppressing information leakage to potential eavesdroppers.

Under the same transmission procedure as described in Section 2, the input–output relationship at channel use \(i\) are given by \[\begin{align} \mathbf{y}[i] &= \mathbf{h}\, \boldsymbol{z}'[i] + \mathbf{m}[i], \nonumber\\ \mathbf{y}_{\mathrm{adv}}[i] &= \mathbf{h}_{\mathrm{adv}}\, \boldsymbol{z}'[i] + \mathbf{m}_{\mathrm{adv}}[i],\quad i = 1,2,\dots,r, \end{align}\] where \(\mathbf{h} \in \mathbb{R}_{+}^{M}\) denotes the legitimate block-fading channel to the inference server, \(\mathbf{h}_{\mathrm{adv}} \in \mathbb{R}_{+}^{M}\) represents the adversarial channel, and \(\mathbf{m}[i]\), \(\mathbf{m}_{\mathrm{adv}}[i]\) denote additive noise terms. The adversary reconstructs the transmitted feature vector at channel use \(i\) using a linear estimator, i.e., \[\begin{align} \hat{\mathbf{z}}[i] = \frac{\mathbf{h}_{\mathrm{adv}}^{\!\top}}{\alpha}\, \cdot \mathbf{y}_{\mathrm{adv}}[i], \end{align}\] where \(\alpha > 0\) is a scaling coefficient. Following the same analytical procedure as in the single-antenna case, the next lemma characterizes the corresponding adversarial reconstruction error under massive MIMO with channel hardening.

Lemma 3 (Adversarial Reconstruction Error under Massive MIMO).

Under the massive MIMO setting and our proposed transmission scheme, the adversary’s MMSE in estimating \(\mathbf{z}\) satisfies \[\begin{align} \mathrm{MSE}_{\mathrm{adv}} \;\ge\; \frac{r\!\left(\tfrac{\alpha^2\sigma^2}{M} + \sigma_a^2\right)}{\tfrac{\alpha^2(C_z^2+\sigma^2)}{M} + \sigma_a^2}. \end{align}\]

It is worth highlighting that in the massive MIMO limit \((M \to \infty)\), the lower bound converges to \(\mathrm{MSE}_{\mathrm{adv}} \to r\), indicating that the adversary asymptotically gains no information about \(\mathbf{z}\).

7 Conclusions & Future Work↩︎

In this paper, we have introduced a new framework for adversary-aware private inference over wireless channels, termed feature DP, which aims to protect extracted features during transmission from reconstruction attacks. We derived fundamental lower bounds on adversarial reconstruction error, highlighting how key system parameters—such as encoder structure, privacy noise, and channel noise—impact the difficulty of input recovery. As a direction for future work, we plan to explore more structured data acquisition models, to further bridge theory and practical sensing constraints. Also considering non-linear dimensionality reduction mechanisms such convolutional neural networks and investigating the fundamental tradoeff between model complexity and privacy leakage is of another direction of great interest.

References↩︎

[1]
W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Network, vol. 34, no. 3, pp. 134–142, 2019.
[2]
K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The roadmap to 6g: Ai empowered wireless networks,” IEEE Communications Magazine, vol. 57, no. 8, pp. 84–90, 2019.
[3]
P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
[4]
Y. Liu, X. Zhang, Y. Zhao, K. Chen, and X.-Y. Li, “Livemap: Real-time dynamic map in automotive edge computing,” in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 2020, pp. 1–14.
[5]
N. Shlezinger and I. V. Bajić, “Collaborative inference for AI-empowered IoT devices,” IEEE Internet of Things Magazine, vol. 5, no. 4, pp. 92–98, 2022.
[6]
S. F. Yilmaz, B. Hasircioglu, L. Qiao, and D. Gunduz, “Private collaborative edge inference via over-the-air computation,” arXiv preprint arXiv:2407.21151, 2024.
[7]
M. Seif, Y. Nie, A. J. Goldsmith, and H. V. Poor, “Collaborative inference over wireless channels with feature differential privacy,” arXiv preprint arXiv:2410.19917, 2024.
[8]
C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
[9]
G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: Wireless communication meets machine learning,” IEEE Communications Magazine, vol. 58, no. 1, pp. 19–25, 2020.
[10]
M. Chen, N. Shlezinger, H. V. Poor, Y. C. Eldar, and S. Cui, “Communication-efficient federated learning,” Proceedings of the National Academy of Sciences (PNAS), vol. 118, no. 17, p. e2024789118, 2021.
[11]
M. Frey, I. Bjelaković, and S. Stańczak, “Towards secure over-the-air computation,” in 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 700–705.
[12]
S. F. Yilmaz, B. Hasırcıoğlu, and D. Gündüz, “Over-the-air ensemble inference with model privacy,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2022, pp. 1265–1270.
[13]
Z. Liu, Q. Lan, A. E. Kalør, P. Popovski, and K. Huang, “Over-the-air view-pooling for low-latency distributed sensing,” in Proceedings of the IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2023, pp. 71–75.
[14]
X. Chen, K. B. Letaief, and K. Huang, “On the view-and-channel aggregation gain in integrated sensing and edge AI,” arXiv preprint arXiv:2311.07986, 2023.
[15]
A. Singh, P. Vepakomma, V. Sharma, and R. Raskar, “Posthoc privacy guarantees for collaborative inference with modified propose-test-release,” Advances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 26 438–26 451, 2023.
[16]
X. Li and S. Bi, “Optimal ai model splitting and resource allocation for device-edge co-inference in multi-user wireless sensing systems,” IEEE Transactions on Wireless Communications, vol. 23, no. 9, pp. 11 094–11 108, 2024.
[17]
S. F. Yilmaz, B. Hasırcıo, L. Qiao, D. Gündüz et al., “Private collaborative edge inference via over-the-air computation,” IEEE Transactions on Machine Learning in Communications and Networking, 2025.
[18]
Z. Lyu, M. Xiao, J. Xu, M. Skoglund, and M. Di Renzo, “The larger the merrier? efficient large ai model inference in wireless edge networks,” arXiv preprint arXiv:2505.09214, 2025.
[19]
J. Sokolić, R. Giryes, G. Sapiro, and M. R. Rodrigues, “Robust large margin deep neural networks,” IEEE Transactions on Signal Processing, vol. 65, no. 16, pp. 4265–4280, 2017.
[20]
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1912–1920.
[21]
J. Blocki, A. Blum, A. Datta, and O. Sheffet, “The johnson-lindenstrauss transform itself preserves differential privacy,” in Proceedings of the IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), 2012, pp. 410–419.
[22]
L. Sagun, U. Evci, V. U. Guney, Y. Dauphin, and L. Bottou, “Empirical analysis of the hessian of over-parametrized neural networks,” arXiv preprint arXiv:1706.04454, 2017.
[23]
C. H. Martin and M. W. Mahoney, “Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning,” arXiv preprint arXiv:1810.01075, 2018.
[24]
R. Vershynin, High-dimensional probability: An introduction with applications in data science.Cambridge University Press, 2018.
[25]
H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Aspects of favorable propagation in massive mimo,” in 2014 22nd European Signal Processing Conference (EUSIPCO), 2014, pp. 76–80.