October 23, 2025
Movable antenna (MA) technology provides a promising avenue for actively shaping wireless channels through dynamic antenna positioning, thereby enabling electromagnetic radiation reconstruction to enhance physical layer security (PLS). However, its practical deployment is hindered by two major challenges: the high computational complexity of real-time optimization and a critical temporal mismatch between slow mechanical movement and rapid channel variations. Although data-driven methods have been introduced to alleviate online optimization burdens, they are still constrained by suboptimal training labels derived from conventional solvers or high sample complexity in reinforcement learning. More importantly, existing learning-based approaches often overlook communication-specific domain knowledge—particularly the asymmetric roles and adversarial interactions between legitimate users and eavesdroppers, which are fundamental to PLS. To address these issues, this paper reformulates the MA positioning problem as a predictive task and introduces RoleAware-MAPP, a novel Transformer-based framework that incorporates domain knowledge through three key components: role-aware embeddings that model user-specific intentions, physics-informed semantic features that encapsulate channel propagation characteristics, and a composite loss function that strategically prioritizes secrecy performance over mere geometric accuracy. Extensive simulations under 3GPP-compliant scenarios show that RoleAware-MAPP achieves an average secrecy rate of 0.3569 bps/Hz and a strictly positive secrecy capacity of 81.52%, outperforming the strongest baseline by 48.4% and 5.39 percentage points, respectively, while maintaining robust performance across diverse user velocities and noise conditions.
Movable antenna, position prediction, physical layer security, Transformer framework
With the exponential growth of network access points and data throughput in wireless environments, multi-input multi-output (MIMO) technology has emerged as a fundamental solution to meet the increasing demands for high-capacity and low-latency communications. By deploying multiple antennas and establishing multiple transceiver links, MIMO systems have successfully met the growing communication requirements [1]. However, when confronting complex interference scenarios and stringent quality-of-service demands, particularly in urban environments, traditional fixed-position antenna (FPA) systems increasingly expose their inherent limitations in terms of flexibility and performance-cost trade-offs.
Movable antenna (MA) technology has been proposed as a transformative approach that exploits spatial degrees of freedom (DoF) without increasing the number of physical antenna elements [2]. By dynamically adjusting the antenna positions and orientations across one to six dimensions, MA systems can actively shape wireless channels rather than passively adapting to it, achieving electromagnetic radiation reconstruction (ERR)—the ability to actively reconstruct and optimize electromagnetic wave propagation patterns in real-time. This ERR capability enables significant performance improvements in channel capacity, interference suppression, and physical layer security [3]–[5]. Despite these advantages, the practical implementation of MA systems faces two major challenges. First, the incorporation of antenna mobility transforms conventional optimization problems into high-dimensional dynamic non-convex formulations. The resulting computational complexity, often addressed via iterative algorithms, becomes prohibitive for real-time operation [6]. Second, and more critically, a fundamental temporal mismatch exists between the slow mechanical movement of antennas (on the order of milliseconds to seconds) and the rapid dynamics of user mobility and channel variations. This mismatch leads to outdated channel state information (CSI) by the time antennas are repositioned, severely degrading system performance [7].
In response, recent studies have explored data-driven methods to circumvent online optimization. These intelligent decision-making approaches can be broadly categorized into supervised learning methods and reinforcement learning frameworks. Several machine learning (ML) models have been proposed to learn direct mappings from system states to optimal MA positions, shifting computational load to an “offline training-online inference” paradigm. Although such models can achieve millisecond-level inference latency, their performance is inherently bounded by the quality of training labels—typically generated via suboptimal numerical solvers—which limits achievable gains. Reinforcement learning (RL) offers an alternative by framing MA positioning as a Markov decision process [8], [9]. While RL can adaptively learn control policies through environmental interaction, it suffers from high sample complexity, training instability, and difficulties in scaling to high-dimensional state spaces. These issues hinder its applicability in real-world dynamic settings [10].
The growing integration of artificial intelligence (AI) with wireless communications opens new pathways for intelligent MA control. Recent works have begun to leverage AI for communication-specific tasks, yet a deeper incorporation of domain knowledge—particularly in physical layer security (PLS)—remains underexplored [11], [12]. In PLS, the distinct roles of legitimate users (Bob) and eavesdroppers (Eve) introduce asymmetric objectives rooted in game-theoretic interactions. Effectively embedding such role-aware semantics into learning frameworks is crucial for optimizing secrecy performance [13], [14]. These considerations motivate the central research questions addressed in this work: Can we effectively extract and leverage spatio-temporal features—including spatial geometry, temporal dynamics, and user role semantics—to enhance MA position prediction? More specifically, how can we construct an accurate mapping from historical MA configurations, user CSI, and location data to future optimal antenna placements, thereby enabling real-time trajectory planning?
To tackle these challenges, this paper introduces RoleAware-MAPP, a role-aware Transformer framework designed for MA position prediction. This framework represents a novel security-oriented adaptive control strategy that integrates intelligent decision-making with ERR capabilities. By integrating domain-specific knowledge and explicitly modeling the asymmetric characteristics of different user roles, the proposed approach achieves high predictive accuracy while maintaining real-time inference capability. The main contributions of this paper are summarized as follows:
The paper innovatively transforms the intractable, non-convex problem of real-time MA position optimization into a tractable supervised learning task. By framing it as a predictive challenge—mapping historical channel and user data to future optimal antenna positions—this approach effectively bypasses the prohibitive computational complexity and inherent latency mismatch of traditional optimization methods, paving the way for a practical solution.
The RoleAware-MAPP was proposed, which is a novel Transformer-based framework specifically engineered for communication security. Its core innovations lie in the deep integration of domain knowledge, featuring a role-aware embedding mechanism that asymmetrically models legitimate users and eavesdroppers, and a communication semantic extractor that provides strong, physics-informed inductive bias. The entire model is guided by a composite loss function that prioritizes security performance over mere geometric accuracy.
Through extensive simulations in realistic 3GPP vehicular scenarios, we demonstrate the superior performance and robustness of the proposed framework. The results validate our design philosophy: RoleAware-MAPP significantly outperforms state-of-the-art baselines in critical security metrics, such as Average Secrecy Rate (ASR) and Strictly Positive Secrecy Capacity (SPSC), across a wide range of user velocities and noise conditions, confirming its effectiveness for dynamic wireless environments.
The remainder of this paper is organized as follows. Section 2 reviews the related work in movable antenna systems and physical layer security. In Section 3, we introduce the system and channel model for our scenario and provide detailed problem formulation, transforming the intractable optimization problem into a predictive task. Section 4 elaborates on the architecture of our proposed RoleAware-MAPP framework. We present and analyze the extensive simulation results in Section 5 to validate the model’s performance. Finally, Section 6 concludes the paper and discusses potential future research directions.
Notations: In this paper, \(\boldsymbol{x}^T\) and \(\boldsymbol{x}^H\) represent the transpose and conjugate transpose of the matrix or vector \(\boldsymbol{x}\), respectively. \(\mathbb{C}^{a \times b}\) and \(\mathbb{R}^{a \times b}\) respectively denote \(a \times b\) dimensional complex matrices and \(a \times b\) dimensional real matrices. \(||\mathbf{x}||_{2}\) represents the 2-norm of the vector \(\mathbf{x}\). \(\text{tr}(\mathbf{x})\) and \(\rm{diag}(\mathbf{x})\) respectively represent the trace and the diagonal matrix with diagonal elements \(\mathbf{x}\). \(\left| \mathbf{x} \right|\) represents taking the modulus of vector \(\mathbf{x}\). \(\nabla_{x}\) and \(\frac{\partial{\partial x}}{None}\) respectively denote the gradient operator and the partial derivative operator. \([x]^{+}\) represents \(\max\{x, 0\}\).
From the perspective of PLS, the application of MA to enhance the secrecy performance in wireless communications has garnered substantial research interest. In this section, we review the state-of-the-art advances across three key domains: ERR methods, intelligent decision-making frameworks, and security-oriented adaptive control strategies. A comparative summary of representative studies is provided in Table 1.
MA can enable a transition from “passive channel adaptation” to “active channel construction” by dynamically reconfiguring the positions and orientations of antenna elements [2]. Recent works have demonstrated its effectiveness in exploiting DoF [15], suppressing interference [16]–[18], enabling flexible beamforming [8], [19], [20], modulating null-steering [17], and enhancing the strength of received signal [15], [18].
In [16], considering the integrated sensing and communication (ISAC) systems, Yu et al. identified MA as a key enabler for the control of the radiation pattern and the adaptation of the electromagnetic environment. In particular, they proposed an interference-coupling framework that jointly ensures communication reliability, security, and sensing accuracy. In [15], Ma et al. jointly optimized antenna positions and signal covariance matrices to maximize channel capacity, showing significant gains over conventional FPA-based MIMO systems. To address channel estimation challenges in MA setups, in [21], Shao et al. leveraged the directional sparsity of six-dimensional movable antennas (6DMA) and designed a three-stage CSI estimation protocol. In [22], Ma et al. further introduced a joint optimization framework of MA and reconfigurable intelligent surface (RIS) parameters, achieving synergistic improvements in sensing, communication, and security.
The integration of AI and ML, particularly deep reinforcement learning (DRL), has opened new avenues for MA system optimization. For example, in [8], Weng et al. proposed a heterogeneous multi-agent deep deterministic policy gradient (MADDPG) framework, in which agents independently learn beamforming and mobility policies under imperfect CSI, effectively decoupling the two learning processes and mitigating performance degradation from outdated CSI. Extending this line of work, in [19], Xie et al. developed an enhanced heterogeneous MADDPG architecture with specialized agents for antenna configuration, improving both reliability and transmission efficiency through centralized training and decentralized execution. For 6DMA systems, in [20], Shao et al. introduced a mixed-field channel model and applied DRL to jointly optimize antenna positions, orientations, and beamforming.
Considering federated learning scenarios, in [23], Niyato et al. proposed a holistic optimization framework combining successive convex approximation and penalty dual decomposition to jointly optimize global rounds, antenna positions, and beamforming matrices. In [24], using DRL, Bai et al. reduced computational complexity by optimizing UAV trajectories and antenna positions, significantly lowering the overhead of online computation. In [25], Jang et al. designed a deep neural network (DNN)-assisted channel estimation framework that jointly optimizes antenna placement and channel modeling.
PLS in MA-enabled systems has drawn considerable interest due to the MA’s inherent capability to actively shape propagation channels [17], [18], [26]. In [17], Kang et al. modeled eavesdropping scenarios as a joint optimization of multi-beamforming and antenna positioning. In particular, they introduced an adaptive loss function that maximizes the minimum beamforming gain while adaptively suppressing interference leakage, incorporating a dynamic tradeoff parameter to balance user rate maximization against information leakage. For multi-jammer environments, in [18], Tang et al. formulated the problem of PLS as SINR maximization and designed a multilayer perceptron (MLP)-based deep learning model to optimize antenna positions, achieving near-optimal anti-jamming performance with low online complexity.
Addressing the temporal mismatch between mechanical MA movement and user mobility, in our previous work [26], we reformulated continuous antenna positioning as a predictive task. In detail, we developed a hybrid Transformer-LSTM network that captures spatiotemporal dependencies in historical trajectories, reducing normalized mean squared error (NMSE), a popular evaluation metric for channel estimation accuracy, by 49% at least compared to benchmark [27], [28] and improving practicality in dynamic environments.
Motivation: To sum up, significant developments have been made in AI-driven MA’s system optimization in terms of PLS, but a key limitation remains. That is, most existing methods treat MA positioning as a generic sequence modeling problem, lacking deep integration of communication-specific domain knowledge, such as the asymmetric roles of legitimate users and Eves in PLS. Moreover, the fundamental mismatch between slow mechanical movement of MA systems and fast channel dynamics of real environments continues to challenge real-time control effectiveness. Motivated by these gaps, this paper introduces a novel predictive framework that explicitly embeds domain knowledge to overcome latency limitations and directly optimize security-centric performance.
A downlink time-division duplexing (TDD) vehicular communication system is considered, as illustrated in Fig. 1. The base station (BS), located at a fixed position of \((0, 0, h_\text{BS})\) in a 3D Cartesian coordinate system, serves multiple users within a dynamic urban environment, where \(h_\text{BS}\) represents the height of BS. The scenario is characterized by the presence of two distinct user roles: a legitimate desired user (Bob) and an illegitimate passive eavesdropper (Eve). Both users equipped with a single antenna are mobile and navigate through an environment with both line-of-sight (LoS) and non-line-of-sight (NLoS) propagation paths. Denote the 3D coordinates of the \(k\)-th user as \(\mathbf{u}_k = [x_k, y_k, h_v]^\mathrm{T}\), where \(k \in \{1, 2, \ldots, K\}\) and \(h_v\) represents the vehicle height. The total number of users is \(K = U + V\), where \(U\) and \(V\) denote the number of desired and undesired users, respectively.
The BS is equipped with a MA array, which serves as the core component of our system model. The array consists of \(N_t = N_h \times N_v\) antenna elements, with \(N_h\) and \(N_v\) representing the number of antennas in the horizontal and vertical directions, respectively. As illustrated in Fig. 1, each antenna element can be dynamically repositioned within an individual square region of side length \(4\lambda\), while the entire array is confined within a square aperture of dimension \(D = 12\lambda \times 12\lambda\), where \(\lambda\) is the carrier wavelength. The local coordinate of the \(n\)-th antenna element, relative to the array center at time \(t\), is given by \(\mathbf{p}_n(t) = [0, y_n(t), z_n(t)]\) for \(n = \{1, \dots, N_t\}\). This geometric reconfigurability introduces additional degrees of freedom to actively manipulate the wireless channel.
Based on the aforementioned system model, a geometry-based multipath channel model is adopted to describe the time-varying wireless propagation in the MA-enabled vehicular environment. The channel between the \(n\)-th antenna of BS and the \(k\)-th user at time \(t\) comprises one LoS path and \(P\) NLoS paths. Accordingly, the channel coefficient denoted by \(h_{n,k}(t)\), can be represented as the superposition of these multipath components [29]. That is, \[h_{n,k}(t) = \sum_{p=1}^{P+1} \alpha_p \beta_p e^{j2\pi \frac{(\mathbf{r}_{k,p}^{\text{tx}})^T \mathbf{p}_n(t)}{\lambda}} e^{j2\pi w_p t} e^{j2\pi f \tau_p},\] where \(\alpha_p\) and \(\beta_p\) represent the complex gains of the \(p\)-th path. The term \(w_p=(\mathbf{r}_{k,p}^{\text{tx}})^T\mathbf{v}/\lambda\) denotes the Doppler frequency shift induced by user mobility, where \(\mathbf{v}\) represents the velocity vector of the user and \(\lambda\) is the wavelength. The parameter \(\tau_p\) denotes the path delay. The vectors \(\mathbf{r}_{k,p}^{\text{tx}}\) and \(\mathbf{r}_{k,p}^{\text{rx}}\) represent the spherical unit vectors at the BS and user sides, respectively, and can be given by [5] \[\mathbf{r}^{\text{tx}} = \begin{bmatrix} \sin\theta_{\text{EOD}} \cos\phi_{\text{AOD}} \\ \sin\theta_{\text{EOD}} \sin\phi_{\text{AOD}} \\ \cos\theta_{\text{EOD}} \end{bmatrix}, \quad \mathbf{r}^{\text{rx}} = \begin{bmatrix} \sin\theta_{\text{EOA}} \cos\phi_{\text{AOA}} \\ \sin\theta_{\text{EOA}} \sin\phi_{\text{AOA}} \\ \cos\theta_{\text{EOA}} \end{bmatrix},\] where \(\theta_{\text{EOD}}\) and \(\phi_{\text{AOD}}\) represent the elevation angle of departure and azimuth angle of departure, respectively, while \(\theta_{\text{EOA}}\) and \(\phi_{\text{AOA}}\) denote the elevation angle of arrival and azimuth angle of arrival, respectively.
The complete channel vector from the BS with \(N_t\) antennas to the \(k\)-th user can be expressed as \[\mathbf{h}_k(t) = [h_{1,k}(t), \dots, h_{N_t,k}(t)]^\mathrm{T} \in \mathbb{C}^{N_t \times 1}.\] The signal received at the \(k\)-th user, after applying a beamforming vector \(\mathbf{w}(t) \in \mathbb{C}^{N_t \times 1}\) at the BS, is given by \[y_k(t) = \mathbf{h}_k^\mathrm{H}(t) \mathbf{w}(t) s(t) + n_k(t),\] where \(s(t)\) denotes the transmitted symbol with unit power, and \(n_k(t) \sim \mathcal{CN}(0, \sigma^2)\) is the Additive White Gaussian Noise (AWGN). Consequently, the achievable channel capacity for user \(k\) at time \(t\) is represented as [12] \[C_k(t) = \log_2 \left(1 + \frac{|\mathbf{h}_k^\mathrm{H}(t) \mathbf{w}(t)|^2}{\sigma^2}\right).\]
Conventional MA positioning strategies are predominantly designed for instantaneous optimization, wherein antenna configurations are computed and deployed based on the current CSI. However, such approaches suffer from a fundamental temporal mismatch in practical deployments: iterative optimization algorithms incur substantial computational latency—ranging from hundreds of milliseconds to several seconds—while the physical movement of MA elements further introduces non-negligible actuation delay. By the time an “optimal” MA positioning reconfiguration is attained, the underlying CSI may have already evolved significantly due to user mobility. This issue is particularly critical in PLS contexts, where both the Bob and the Eve experience time-varying channels. To overcome this limitation, we introduce a continuous-time MA positioning prediction framework, in which one Bob and one Eve move continuously over an extended temporal horizon. The objective shifts from optimizing performance at discrete instants to sustaining enhanced secrecy performance throughout the entire communication duration. This reformulation naturally motivates a predictive optimization paradigm: instead of reacting to outdated CSI, we seek to infer future optimal MA positioning configurations from historical channel observations, thereby inherently absorbing both computational and mechanical latencies into the predictive horizon.
Evaluating such continuous-time MA-enabled PLS systems requires simultaneously assessing three distinct performance dimensions that cannot be captured by any single conventional metric. To this end, we adopt the following three performance indicators: (i) ASR quantifies the time-averaged capacity advantage of Bob over Eve, measuring security throughput across the entire horizon; (ii) SPSC measures the probability of maintaining positive secrecy rates, reflecting reliability across varying conditions; (iii) NMSE assesses MA’s geometric positioning accuracy relative to theoretical optima. Mathematically, the corresponding expressions are presented as follows.
ASR [30]: In scenarios involving both desired and undesired users, the ASR can quantify the differential channel capacities between desired and undesired users. Over a continuous period spanning \(F\) time steps, it is defined as \[\tilde{R} = \frac{1}{F} \sum_{t=1}^{F} [C_b(t) - C_e(t)]^+, \label{qjuemtrz}\tag{1}\] where \(C_b(t)\) and \(C_e(t)\) denote the instantaneous channel capacities of Bob and Eve at time \(t\), respectively, and \([\cdot]^+\) denotes \(\max\{\cdot, 0\}\). A higher ASR value indicates that the system successfully maximizes the capacity advantage of the desired user over the undesired user.
SPSC [31]: The SPSC measures the reliability of maintaining positive differential capacity throughout the observation period. It can be defined as \[\text{SPSC} = \Pr\left[\tilde{R} > 0\right].\] A higher SPSC value indicates that the system can more consistently maintain favorable channel conditions for the desired user relative to the undesired user across varying channel states and mobility patterns.
NMSE [32]: To quantify the gap between actual antenna positions and optimal antenna positions, the NMSE is usually regarded as one of standard metrics. For a given MA position configuration \(\hat{\mathcal{P}}_F\) and its optimal counterpart \(\mathcal{P}_F\) over a time horizon \(F\), the NMSE is defined as \[\text{NMSE} = \mathbb{E}\left[ \frac{\|\mathcal{P}_F - \hat{\mathcal{P}}_F \|_F^2}{\|\mathcal{P}_F\|_F^2} \right], \label{svxbwatz}\tag{2}\] where \(\|\cdot\|_F\) denotes the Frobenius norm. A lower NMSE value indicates that the antenna positioning is closer to the optimal configuration.
Together, these three complementary metrics assess the system’s high secrecy rate (via ASR), its transmission reliability (via SPSC), and the geometric precision of its antenna positioning (via NMSE), thereby providing a holistic evaluation of the proposed MA-enabled PLS system.
Based on the continuous-time MA positioning prediction framework and evaluation metrics established above, we formulate the problem of the system security in terms of PLS. The objective is to maximize the secrecy rate by jointly designing MA positions and beamforming vectors throughout the time horizon, subject to physical distance among antennas, transmit power, and mechanical latency constraints.
\[\label{nzbjtsre} \begin{align} \boldsymbol{P1:} \quad \max_{\{\mathbf{p}_n(t)\}, \mathbf{w}(t)} & \quad \tilde{R} \\ \textrm{s.t.} & \quad \text{C1: }|\mathbf{p}_n(t)-\mathbf{p}_{n'}(t)| \geq \lambda/2, \notag \\ & \quad \quad \quad \mathbf{p}_n(t) \in D, n \in [1, N_t]; \\ & \quad \text{C2: }||\mathbf{w}(t)||_2^2\leq P_\text{max}; \\ & \quad \text{C3: }\Delta t \leq \tau_\text{max}, \end{align}\tag{3}\] where C1 enforces minimum inter-antenna spacing (\(\lambda/2\)) and confines antenna movement within the designated region \(D\); C2 limits the transmit power; C3 constrains the antenna repositioning time to account for mechanical movement limitations.
Solving Problem [prob:P1] directly in real-time is intractable due to its non-convex nature arising from the non-linear channel model and coupled optimization variables. Standard iterative algorithms (e.g., alternating optimization, successive convex approximation) not only suffer from high computational complexity—prohibitive for meeting the stringent latency requirements—but also risk converging to poor local optima. More critically, as discussed in Section 3.3, the temporal mismatch between optimization/movement delays and rapid channel dynamics renders any solution based on current CSI obsolete upon deployment.
To circumvent these fundamental limitations, we reformulate Problem [prob:P1] as a supervised predictive learning task. Specifically, we leverage historical observations over a time window \(T\), including user positions \[(\mathbf{U}_{\text{Bob}}[i:i+T], \mathbf{U}_{\text{Eve}}[i:i+T]),\] their corresponding CSI \[(\mathbf{H}_{\text{Bob}}[i:i+T], \mathbf{H}_{\text{Eve}}[i:i+T]),\] and previously optimized MA positions \(\mathcal{P}_T\),5 to predict optimal MA positions for a future time horizon \(F\). This predictive paradigm shifts the computational burden from real-time online optimization to offline model training, enabling millisecond-level inference.
To align the learning objective with communication performance rather than mere geometric accuracy, we design a composite loss function that integrates multiple considerations: \[\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{NMSE}} + \beta \cdot \mathcal{L}_{\tilde{R}} + \gamma \cdot \mathcal{L}_{\text{st}}, \label{evnzmfxg}\tag{4}\] where \(\mathcal{L}_{\text{NMSE}}\) measures geometric positioning accuracy as defined in Eq. [eq:NMSE]; \(\mathcal{L}_{\tilde{R}}\) is a differentiable surrogate loss that encourages configurations yielding high secrecy rates; \(\mathcal{L}_{\text{st}}\) penalizes violations of physical constraints (e.g., minimum inter-antenna spacing in C1). The weights \(\alpha\), \(\beta\), and \(\gamma\) are dynamically adjusted during training to emphasize different objectives at various stages—initially prioritizing geometric accuracy for rapid convergence to feasible regions, then progressively emphasizing security performance.
The optimization Problem [prob:P1] is thus reformulated as \[\tag{5} \begin{align} \boldsymbol{P2:} \quad \min_{\Omega} & \quad \mathbb{E}[\mathcal{L}_{\text{total}}] \\ \textrm{s.t.} & \quad \hat{\mathcal{P}}_F = f_{\Omega}(\mathbf{U}_{\text{Bob}}[1:T], \mathbf{U}_{\text{Eve}}[1:T], \notag \\ & \quad \quad \mathbf{H}_{\text{Bob}}[1:T], \mathbf{H}_{\text{Eve}}[1:T], \mathcal{P}_T), \tag{6} \end{align}\] where \(\Omega\) denotes the parameters of a neural network \(f_{\Omega}\) that learns the mapping from historical observations to future optimal positions. Solving Problem [prob:P2] yields a predictive model capable of inferring near-optimal MA positions in real-time, naturally absorbing computational and mechanical delays into the prediction horizon. The architecture of \(f_{\Omega}\)—specifically designed to integrate communication domain knowledge—is detailed in the next section.
To address the predictive optimization problem formulated in Problem [prob:P2], we propose RoleAware-MAPP, a novel deep learning framework built on the Transformer architecture, which is well known for its ability to capture long-range dependencies in sequential data. To tailor the model to the unique physical characteristics of wireless channels, we incorporate several domain-specific innovations. The overall framework follows a structured dataflow comprising four key components: a data preprocessor, a communication-aware embedding module, a Transformer backbone, and an output projection layer. The overall architecture is depicted in Fig. 2. In addition, the design and rationale of each component are detailed in the following subsections.
The input data first undergoes a preprocessing stage to harmonize its multi-modal nature. The model takes as input a concatenated tensor \(\mathbf{X}_{\text{in}} \in \mathbb{R}^{T \times d{\text{in}}}\), where \(d_{\text{in}}\) is the total input feature dimension. To address the statistical heterogeneity across the five distinct data streams, we apply a grouped normalization strategy, where in each stream \(\mathbf{X}_i\) is independently normalized using z-score standardization, namely
\[\bar{\mathbf{X}}_i = (\mathbf{X}_i - \mu_i) / \sigma_i,\] where \(\mu_i\) and \(\sigma_i\) denote the mean and standard deviations, respectively. The parameters \((\mu_{\mathcal{P}_T}, \sigma_{\mathcal{P}_T})\) for the MA positions are retained for the de-normalization stage.
This module is designed to bridge the gap between generic sequence modeling and communication-domain specificity by integrating physical-layer semantics into the representation learning process. Its architecture comprises four specialized components, each serving a distinct purpose in capturing the unique characteristics of secure MA systems: 1) role-aware feature extraction that asymmetrically models legitimate and malicious users, 2) physics-informed semantic extraction that encodes channel propagation characteristics, 3) cross-role interaction modeling that captures the adversarial relationship between communication entities, and 4) adaptive feature fusion that dynamically balances different semantic representations. This comprehensive embedding strategy enables the model to learn representations that are not only geometrically meaningful but also communication-theoretic relevant, providing a solid foundation for subsequent position prediction.
To reflect the asymmetric roles of Bob and Eve in physical layer security, we employ differentiated embedding pathways: Bob’s data (position and CSI) are processed through deeper two-layer MLPs to capture fine-grained security-critical features, while Eve’s data use simpler single-layer projections. For example, Bob’s position embedding is generated as: \[\mathbf{E}_{\text{Bob-pos}} = \text{LayerNorm}(\text{Linear}_2(\text{ReLU}(\text{Linear}_1(\bar{\mathbf{U}}_{\text{Bob}})))).\] This asymmetric design allocates more model capacity to Bob, aligning with the security objective. Similarly, we generate \(\mathbf{E}_{\text{Bob-CSI}}\), \(\mathbf{E}_{\text{Eve-pos}}\), \(\mathbf{E}_{\text{Eve-CSI}}\), and \(\mathbf{E}_{\text{MA-pos}}\) from respective input streams, all in \(\mathbb{R}^{T \times d_{\text{model}}}\).
This component extracts physics-informed features directly from normalized inputs, including spatial features (user-to-BS distances, angles and velocity) and channel features (channel capacity and instantaneous secrecy rate). These features are projected into a semantic embedding \(\mathbf{E}_{\text{sem}} \in \mathbb{R}^{T \times d_{\text{model}}}\) via a two-layer MLP, providing strong inductive biases grounded in communication theory.
To model the adversarial relationship between Bob and Eve, we apply cross-attention where Bob’s CSI queries attend to Eve’s CSI keys/values: \[\mathbf{E}_{\text{Bob-enh}} = \text{LayerNorm}\left(\mathbf{E}_{\text{Bob-CSI}} + \text{softmax}\left(\frac{\mathbf{Q}\mathbf{K}^\mathrm{T}}{\sqrt{d_k}}\right)\mathbf{V}\right),\] where \(\mathbf{Q} = \mathbf{E}_{\text{Bob-CSI}}\mathbf{W}_Q\), \(\mathbf{K} = \mathbf{E}_{\text{Eve-CSI}}\mathbf{W}_K\), \(\mathbf{V} = \mathbf{E}_{\text{Eve-CSI}}\mathbf{W}_V\), and \(\mathbf{W}_Q, \mathbf{W}_K, \mathbf{W}_V \in \mathbb{R}^{d_{\text{model}} \times d_k}\) are learned projection matrices.
The six embeddings (\(\mathbf{E}_{\text{Bob-pos}}\), \(\mathbf{E}_{\text{Eve-pos}}\), \(\mathbf{E}_{\text{MA-pos}}\), \(\mathbf{E}_{\text{Bob-enh}}\), \(\mathbf{E}_{\text{Eve-CSI}}\), \(\mathbf{E}_{\text{sem}}\)) are aggregated via learnable weighted summation: \[\mathbf{E}_{\text{fused}} = \sum_{i=1}^{6} w_i \mathbf{E}_i \in \mathbb{R}^{T \times d_{\text{model}}},\] where \(\{w_i\}\) are trainable parameters that automatically balance each embedding’s contribution.
The core of our model is a standard Transformer architecture with \(N_{\text{enc}}\) encoder layers and \(N_{\text{dec}}\) decoder layers. The encoder receives \(\mathbf{E}_{\text{fused}}\) and generates a latent contextual representation \(\mathbf{Z} \in \mathbb{R}^{T \times d_{\text{model}}}\). Critically, the decoder employs a non-autoregressive prediction paradigm, predicting the entire future sequence of length \(F\) in a single forward pass. This parallel decoding approach drastically reduces inference time, which is critical for real-time applications. \[\mathbf{O} = \text{Transformer}(\mathbf{Z}) \in \mathbb{R}^{F \times 3N_t},\] where Transformer(·) denotes the backbone networks.
This component translates the decoder’s abstract feature representation \(\mathbf{O} \in \mathbb{R}^{F \times 3N_t}\) into physical antenna positions. A fully connected layer first projects the feature dimension to the physical dimension: \[\hat{\mathcal{P}}_{F, \text{norm}} = \text{Linear}(\mathbf{O}) \in \mathbb{R}^{F \times 3N_t}.\] Subsequently, a de-normalization step converts the predictions back to physical coordinates: \[\hat{\mathcal{P}}_F = \hat{\mathcal{P}}_{F, \text{norm}} \cdot \sigma_{\mathcal{P}_T} + \mu_{\mathcal{P}_T}.\] This ensures that the final output is directly interpretable for controlling the MA hardware.
In this section, we conduct extensive simulations to evaluate the performance of our proposed RoleAware-MAPP framework. We first detail the experimental setup, and subsequently present a comprehensive analysis of the simulation results.
The training, validation, and testing datasets are constructed through large-scale simulations that emulate the dynamic vehicular communication scenario outlined in Section 3. We utilize QuaDRiGa, a widely adopted 3GPP-compliant channel generator that implements a geometry-based stochastic channel model with high realism. A principal advantage of QuaDRiGa in our context is its antenna-agnostic design, which supports the integration of arbitrary antenna configurations. This feature is essential for simulating our MA system, as it allows us to dynamically update the position of each antenna element in the array at every time snapshot, thereby faithfully capturing channel variations resulting from reconfigurable geometry [33].
The simulation operates at a millimeter-wave carrier frequency of 28 GHz under the 3GPP Urban Macro (UMa) NLoS scenario. The BS is equipped with a \(3 \times 3\) MA array, while both Bob and Eve are assumed to employ a single omnidirectional antenna. To capture a diverse range of realistic user mobility patterns, we simulate multiple motion trajectories with user velocities uniformly sampled between 10 and 100 km/h. Each data sample comprises a sequence of 20 consecutive snapshots spaced in a 0.1s interval, corresponding to an input sequence length of \(T=16\) and a prediction horizon of \(F=4\).
A crucial step in dataset generation is the construction of ground truth labels for supervised learning. These labels represent the theoretically optimal MA positions at each time instant. To obtain them, we treat every snapshot as an independent static optimization problem and perform an exhaustive offline search. Specifically, for each snapshot, we relax the real-world latency constraint (C3 in Problem [prob:P1]) and solve the secrecy rate maximization problem using a Particle Swarm Optimization (PSO) algorithm[34]. This computationally intensive procedure yields a high-quality, albeit non-causal, suboptimal solution to the non-convex problem, serving as an upper-bound performance target for that specific instant. By repeating this process across all snapshots, we construct the ground truth trajectory sequences \(\mathcal{P}_T\). The final dataset consists of 43,200 unique samples, partitioned into training (70%), validation (15%), and testing (15%) subsets. Key simulation parameters are summarized in Table 2.
| Parameter | Value |
|---|---|
| Carrier Frequency | 28 GHz[5] |
| Channel Model | Urban Macro (UMa), NLoS[33] |
| BS Antenna Array (\(N_h \times N_v\)) | \(3 \times 3\)[5] |
| User Velocity | Uniformly in [10, 100] km/h |
| Input Sequence Length (\(T\)) | 16[12] |
| Prediction Sequence Length (\(F\)) | 4[12] |
| Total Samples | 43,200 |
| Dataset Split (Train/Valid/Test) | 70% / 15% / 15% |
The proposed model is implemented using the PyTorch framework, with its core architecture based on a Transformer model containing 3 encoder and 3 decoder layers. The key hyperparameters for the model’s structure, such as the embedding dimension, number of attention heads, and dropout rate, are detailed in Table 3. These values were selected to provide a robust balance between model capacity and computational efficiency.
For the training process, we employed the Adam optimizer, a standard choice for deep learning tasks. A warm-up and cosine annealing schedule was used to dynamically adjust the learning rate, ensuring stable convergence. The model was trained for 100 epochs with a batch size of 256, using the specific parameters listed in Table 3.
| Parameter | Value |
|---|---|
| Embedding Dimension (\(d_{\text{model}}\)) | 128 |
| Encoder/Decoder Layers (\(N_{\text{enc}}, N_{\text{dec}}\)) | 3 |
| Attention Heads | 8 |
| Feed-Forward Dimension (\(d_{\text{ff}}\)) | 256 |
| Dropout Rate | 0.1 |
| Optimizer | Adam |
| Learning Rate (Max) | \(1 \times 10^{-4}\) |
| Batch Size | 256 |
| Epochs | 100 |
The performance of the proposed RoleAware-MAPP is benchmarked against several representative methods spanning from naive approaches to state-of-the-art deep learning architectures:
PSO [34]: It is a population-based metaheuristic algorithm that represents the traditional iterative optimization paradigm for solving non-convex problems.
RNN [35]: A vanilla RNN processes the sequence of MA positions by maintaining and updating a hidden state, thereby providing a fundamental capability for modeling temporal channel variations.
LSTM [28]: It utilizes gating mechanisms to mitigate the vanishing gradient problem, thereby enabling more effective modeling of long-range temporal dependencies in channel state evolution.
GRU [36]: It simplifies LSTM architecture while maintaining comparable performance, offering a balance between computational efficiency and temporal modeling capability.
CNN-LSTM [37]: It combines CNN for spatial feature extraction with LSTM for temporal sequence modeling, leveraging complementary strengths of both approaches to capture spatio-temporal channel characteristics.
Transformer [32]: It leverages self-attention mechanisms to process the entire input sequence in parallel. This design circumvents the sequential processing bottleneck inherent in RNN-based models and effectively captures global dependencies across the sequence.
All deep learning baselines are trained under identical conditions with same loss function (NMSE), optimizer (Adam), and training hyperparameters to ensure a fair comparison.
Fig. 3 depicts the progression of individual loss components throughout the 100 epochs training process of RoleAware-MAPP. As observed in Fig. 3 (a), the total loss \(\mathcal{L}_{\text{total}}\) follows a steady decline from an initial value of 0.1 to a final value of -0.28, indicating smooth and stable convergence. The emergence of a negative total loss is attributed to the predominance of the secrecy rate loss term \(\mathcal{L}_{\tilde{R}}\), which inherently assumes negative values when the model effectively enhances the secrecy rate—consistent with its design objective.
The training process employs a two-phase strategy with dynamic weight adjustment to guide model optimization effectively. During the initial warm-up phase (epochs 1–10), the framework prioritizes geometric accuracy by assigning a higher weight to the position prediction loss \(\mathcal{L}_{\text{NMSE}}\), encouraging the model to rapidly converge toward regions near the ground truth labels. As shown in Fig. 3 (b), \(\mathcal{L}_{\text{NMSE}}\) decreases from 0.31 to 0.25 within the first four epochs, corresponding to a 19.4% reduction. Subsequently, the influence of the secrecy rate loss \(\mathcal{L}_{\tilde{R}}\) is progressively strengthened. From epoch 44 to 60, \(\mathcal{L}_{\tilde{R}}\) rises markedly from 0.29 to 0.375—a 29.3% increase—before stabilizing between 0.33 and 0.35, indicating the approach of a balanced trade-off with the secrecy objective.
As illustrated in Fig. 3 (c), the secrecy rate loss \(\mathcal{L}_{\tilde{R}}\) shows consistent improvement throughout training, declining monotonically from –0.32 to –0.43, which reflects a 34.4% enhancement in the secrecy performance. This trend confirms the efficacy of the role-aware embedding mechanism in learning to maximize the capacity difference between the Bob and the Eve. Meanwhile, as shown in Fig. 3 (d), the physical constraint loss \(\mathcal{L}_{\text{st}}\) converges rapidly to values below \(2 \times 10^{-5}\) within the first 20 epochs, demonstrating the model’s capability to adhere to the minimum inter-antenna spacing requirements without compromising other learning objectives.




Figure 3: Composite loss in training.
Next, Fig. 4 illustrates the influence of the two-phase training strategy on communication performance metrics. During the initial warm-up phase (epochs 1–10), the SPSC increases from 76.76% to 79.10%, a gain of 2.34 percentage points, as the model rapidly learns to predict antenna configurations near the ground-truth optima. In the subsequent regularization phase (epochs 11–100), where loss weights are dynamically adjusted to balance all objectives, SPSC further improves to 82.33%, corresponding to an additional increase of 2.99 percentage points. The ASR exhibits a similar trend, rising from 0.28bps/Hz to 0.30bps/Hz during warm-up and ultimately reaching 0.36bps/Hz—a total improvement of 28.6%.
The synchronized improvement of both metrics without significant oscillations after epoch 60 confirms the stability of the training process. The superior convergence characteristics of RoleAware-MAPP derive from three critical design elements: 1) the two-phase training strategy with warm-up ensures rapid initial convergence to feasible regions before fine-tuning for security objectives; 2) the role-aware embedding mechanism allocates differentiated model capacity between Bob and Eve, accelerating the learning of security-critical patterns; and 3) the communication semantic extractor provides physics-informed inductive bias, reducing the effective search space and preventing convergence to physically infeasible solutions.
Table 4 presents a comprehensive comparison of RoleAware-MAPP against five baseline methods across six evaluation metrics. The results demonstrate that RoleAware-MAPP achieves substantial improvements in communication-critical metrics while maintaining acceptable computational overhead.
| Model | ASR (bps/Hz) | SPSC | NMSE | Parameters | Size | FLOPs | Inference time (ms) | Deployment |
|---|---|---|---|---|---|---|---|---|
| PSO [34] | 0.2755 | 77.89% | / | / | / | 1.73G | 3649.29 | Easy |
| RNN [35] | 0.2351 | 75.39% | 0.1491 | 79.6K | tiny | 727.58M | 0.59 | Easy |
| GRU [36] | 0.2332 | 74.60% | 0.1505 | 310.8K | small | 3.12G | 1.51 | Moderate |
| LSTM [28] | 0.2318 | 74.25% | 0.1568 | 277.8K | small | 2.77G | 1.16 | Moderate |
| CNN-LSTM [37] | 0.2318 | 74.25% | 0.1568 | 308.2K | small | 3.03G | 2.13 | Moderate |
| Transformer [32] | 0.2405 | 76.13% | 0.1475 | 673.7K | medium | 1.41G | 2.57 | Moderate |
| RoleAware-MAPP | 0.3569 | 81.52% | 0.2614 | 1.32M | medium | 4.10G | 6.94 | Complex |
In terms of ASR, the proposed RoleAware-MAPP achieves 0.3569bps/Hz, representing a 48.4% improvement over the best baseline (Transformer model at 0.2405bps/Hz) and a 54.0% improvement over the average baseline performance (0.2318bps/Hz). Similarly, for SPSC, RoleAware-MAPP attains 81.52%, surpassing the Transformer baseline by 5.39 percentage points and outperforming RNN-based methods by approximately 7.27 percentage points. These significant gains in security-oriented metrics validate the effectiveness of our role-aware design philosophy.
From the perspective of NMSE, the Transformer baseline achieves 0.1475 while RoleAware-MAPP records 0.2614. This apparent disadvantage is a deliberate design choice reflecting our model’s specialization. While conventional baselines optimize for geometric prediction accuracy as general-purpose sequence predictors, RoleAware-MAPP prioritizes understanding the complex relationship between antenna positions, channel states, and communication performance. The model trades marginal geometric precision for profound insights into security-critical metrics, functioning as a domain-expert system rather than a generic predictor. This specialization enables the 48.4% ASR improvement, justifying the geometric accuracy trade-off.
Computational complexity analysis indicates that the proposed RoleAware-MAPP requires 1.32M parameters and incurs 4.10G FLOPs [38] (floating point operations), with an average inference time of 6.94 ms per sample. Although these computational demands exceed those of baseline models—the parameter count is 1.96× that of the standard Transformer and 4.75× the average of other baselines—this investment is justified by nearly 50% performance gains in critical security metrics. Notably, the 6.94ms inference latency remains practically feasible for MA positioning applications, where mechanical movement operates on timescales orders of magnitude larger. The substantial performance improvements achieved through this computational investment demonstrate a favorable trade-off, as the additional resources directly translate to enhanced physical layer security capabilities.
Fig. 5 further visualizes this performance trade-off through a radar chart, clearly illustrating RoleAware-MAPP’s dominance in ASR and SPSC dimensions despite higher NMSE. The chart confirms that the proposed model successfully reallocates optimization focus from pure geometric accuracy to communication security objectives.
Fig. 6 and Fig. 7 evaluate model robustness across different user velocities ranging from 10 to 100km/h. For ASR performance, as given in Fig. 6, RoleAware-MAPP maintains superiority across all velocity ranges, achieving peak performance of 0.82bps/Hz at 30km/h while sustaining above 0.08bps/Hz even at 80km/h. In contrast, baseline methods exhibit more severe degradation, with Transformer dropping from 0.55bps/Hz to below 0.1bps/Hz. The performance gap between RoleAware-MAPP and baselines widens at higher velocities, demonstrating enhanced robustness to channel dynamics.
As shown in Fig. 7, the SPSC performance demonstrates that the proposed RoleAware-MAPP maintains a success rate exceeding 70% even at 100km/h, whereas all baseline methods drop below 65%. Across the evaluated speed range of [10, 100]km/h, the proposed framework sustains a performance advantage of over 3%, peaking at a margin of 7.40% at 60km/h. This robustness is attributed to two key factors: the communication-aware embedding module effectively encodes velocity-sensitive channel features, while the cross-role interaction mechanism dynamically adapts to the evolving relationship between Bob and Eve. The consistent superiority across mobility scenarios confirms that RoleAware-MAPP is not only optimized for static configurations but also generalizes robustly to highly dynamic settings, underscoring its practical viability in real-world vehicular communication systems.
To evaluate the robustness of RoleAware-MAPP under varying noise conditions, we examine the model performance across different background noise power levels ranging from 0dB (ideal condition) to 25dB (severe noise interference). This analysis simulates realistic scenarios where environmental noise, interference, and channel impairments degrade communication quality.
Fig. 8 illustrates the ASR performance degradation as noise power increases. Under ideal conditions (0dB), RoleAware-MAPP achieves 0.77bps/Hz, significantly outperforming the Transformer baseline at 0.47bps/Hz and RNN-based methods at approximately 0.45bps/Hz. As noise power increases to 25dB, all methods experience performance degradation, but RoleAware-MAPP demonstrates superior noise resilience. At 25dB noise power, RoleAware-MAPP maintains 0.055bps/Hz while baseline methods approach near-zero rates, with Transformer at 0.042bps/Hz.
Fig. 9 demonstrates notable consistency in SPSC performance across varying noise levels. RoleAware-MAPP sustains an SPSC of approximately 81.5% throughout the evaluated noise power range, with fluctuations constrained within 0.3 percentage points. Similarly, baseline methods exhibit minimal performance degradation with increasing noise power: Transformer stabilizes around 76.5%, followed by RNN (75.4%), GRU (74.6%), and CNN-LSTM (74.3%). The most pronounced variation is observed in LSTM between 20dB and 25dB, though this deviation remains limited to 0.48 percentage points. This consistent performance across all models indicates their inherent capability to maintain reliable secure communication links under diverse noise conditions, with RoleAware-MAPP consistently operating at a 5–7 percentage point advantage over the competing approaches.
To validate the contribution of each proposed component, we conduct comprehensive ablation experiments by systematically removing key modules from RoleAware-MAPP. Table 5 presents the performance of three variants compared to the complete model.
| Configuration | ASR (bps/Hz) | SPSC | NMSE |
|---|---|---|---|
| RoleAware-MAPP* | 0.3654 | 83.53% | 0.3100 |
| w/o Role-Awareness | 0.3510 | 83.28% | 0.3487 |
| w/o Semantic Extractor | 0.3570 | 82.14% | 0.2940 |
| w/o composite loss function | 0.2460 | 77.08% | 0.1498 |
The most critical component is the composite loss function. When replaced with standard NMSE loss alone (w/o composite loss function), ASR dramatically decreases from 0.3654 to 0.2460bps/Hz, a 32.7% reduction, while SPSC drops from 83.53% to 77.08%, losing 6.45 percentage points. Interestingly, this variant achieves the best NMSE of 0.1498, improving 51.6% over the full model’s 0.3100. This stark contrast validates our design philosophy: optimizing solely for geometric accuracy (i.e., NMSE) fails to capture the complex relationship between antenna positions and communication security. The composite loss function successfully redirects optimization focus toward security-critical objectives, justifying the geometric accuracy trade-off for substantial gains in ASR and SPSC.
Removing the communication semantic extractor (w/o Semantic Extractor) results in ASR declining to 0.3570 bps/Hz (2.3% reduction) and SPSC dropping to 82.14% (1.39 percentage points loss). The NMSE improves slightly to 0.2940, indicating that without physics-informed features, the model relies more heavily on geometric patterns. The semantic extractor’s contribution lies in providing domain-specific inductive bias—spatial relationships, channel quality indicators, and physical constraints—that guide the model toward communication-optimal solutions rather than geometrically accurate but communication-suboptimal configurations.
The role-aware embedding mechanism (w/o Role-Awareness) contributes to performance through asymmetric feature processing. Its removal causes ASR to decrease to 0.3510bps/Hz (3.9% reduction) and SPSC to 83.28% (0.25 percentage points loss), while NMSE deteriorates to 0.3487. Without role differentiation, the model treats Bob and Eve symmetrically, losing the ability to explicitly maximize their capacity difference. The degraded NMSE suggests that role-aware processing not only improves security metrics, but also helps the model better understand the overall prediction task by providing clearer optimization objectives.
The ablation results reveal a clear hierarchy of component importance: the composite loss function is fundamental (32.7% ASR impact), followed by the role-aware embedding (3.9% impact), and the semantic extractor (2.3% impact). However, the components exhibit synergistic effects—their combined contribution exceeds the sum of individual impacts. The full model achieves 48.7% higher ASR than the variant without composite loss (0.3654 vs. 0.2460), demonstrating that the role-aware embedding and semantic extractor amplify the benefits of the composite loss function.
To evaluate the generalization ability of RoleAware-MAPP beyond the training configuration, we test all models with different temporal horizons: reducing the input sequence length to \(T=10\) while extending the prediction horizon to \(F=10\). This configuration presents a more challenging scenario—shorter historical context but longer future prediction—testing whether the learned representations transfer effectively to different temporal scales.
Fig. 10 visualizes the comprehensive performance comparison, revealing that RoleAware-MAPP maintains its dominant position in ASR and SPSC metrics despite the configuration change. The performance hierarchy among methods remains consistent: RoleAware-MAPP leads significantly, followed by Transformer, with RNN-based methods trailing. However, the performance gaps widen under the new configuration, indicating that RoleAware-MAPP’s specialized architecture better handles temporal scale variations. The robust generalization validates that RoleAware-MAPP has learned transferable representations rather than overfitting to training settings, demonstrating its practical applicability when prediction requirements vary dynamically based on operational constraints.
In this paper, we tackled the critical challenges of real-time optimization and temporal mismatch in movable antenna systems by reformulating antenna positioning as a predictive learning task. To this end, we introduced RoleAware-MAPP, a novel Transformer-based framework that incorporates communication-domain knowledge through role-aware embeddings and a security-driven composite loss function, effectively prioritizing secrecy performance over geometric precision. Extensive simulations under realistic 3GPP scenarios demonstrate that the proposed framework achieves an Average Secrecy Rate of 0.3569bps/Hz and a Strictly Positive Secrecy Capacity of 81.52%, outperforming the best-performing baseline by 48.4% and 5.39 percentage points, respectively. These results confirm the robustness and generalization capability of RoleAware-MAPP across diverse mobility and noise conditions, underscoring its practical relevance in dynamic wireless environments. Looking forward, we plan to explore hybrid learning strategies—such as reinforcement learning—to alleviate the reliance on computationally intensive offline labels. Furthermore, implementation and validation on a physical MA testbed will be essential to assess the framework’s performance under real-world impairments and hardware constraints.
This work is supported by the National Natural Science Foundation of China with Grant 62301076, Fundamental Research Funds for the Central Universities with Grant 24820232023YQTD01, National Natural Science Foundation of China with Grants 62341101 and 62321001, Beijing Municipal Natural Science Foundation with Grant L232003, and National Key Research and Development Program of China with Grant 2022YFB4300403. (Corresponding author: Kan Yu)↩︎
W. Wang, X. Liu and K. Li are with the School of Computer Science, Qufu Normal University, Rizhao, P.R. China. E-mail: {wangwx@qfnu.edu.cn, liuxw@qfnu.edu.cn, lkx0311@126.com}.↩︎
W. Gong is with Inspur Computing Technology Pty Ltd. and Shandong Key Laboratory of Advanced Computing, Jinan, P. R. China. E-mail: gongwei@inspur.com.↩︎
K. Yu, Y. Zhao, Q. Zhang and Z. Feng are with the Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, 100876, P.R. China. E-mail: {kanyu1108@126.com, yjzhao0318@126.com, zhangqixun@bupt.edu.cn, fengzy@bupt.edu.cn}.↩︎
The historical optimal positions are pre-calculated offline for training purposes using a particle swarm optimization algorithm to solve Problem [prob:P1] without latency constraints.↩︎