Abstract

Nonlinear effects in high-speed optical fiber systems fundamentally limit channel capacity. While traditional Digital Backward Propagation (DBP) with adaptive filters addresses these effects, its computational complexity remains impractical. Data-driven solutions like Filtered DBP (FDBP) reduce complexity but critically lack inherent generalization: Their nonlinear compensation capability cannot be naturally extended to new transmission rates or WDM channel counts without retraining on newly collected data. We propose Meta-DSP, a novel signal processing pipeline combining: (1) Meta-DBP, a meta-learning-based DBP model that generalizes across transmission parameters without retraining, and (2) XPM-ADF, a carefully engineered adaptive filter designed to address multi-channel nonlinear distortions. The system demonstrates strong generalization, learning from 40 Gbaud single-channel data and successfully applying this knowledge to higher rates (80/160 Gbaud) and multi-channel configurations (up to 21 channels). Experimental results show Meta-DSP improves Q-factor by 0.55 dB over CDC in challenging scenarios while reducing computational complexity 10\(\times\) versus DBP. This work provides a scalable solution for nonlinear compensation in dynamic optical networks, balancing performance with practical computational constraints.

Nonlinear compensation , Digital back propagation , Adaptive filter , Meta learning, Hyper-networks.

1 Introduction↩︎

Optical communication networks carry the most traffic in communication networks. In recent years, with the surge in network traffic, optical communication networks have been transitioning towards higher speeds, greater distances, and larger capacities. Wavelength division multiplexing (WDM) technology is a crucial technique for expanding optical fiber capacity. WDM modulates the information of multiple channels onto different optical frequency carriers and integrates them onto the same fiber for propagation, greatly enhancing the propagation capacity of commercial systems. However, WDM systems face a fundamental trade-off: while increasing signal power improves the signal-to-noise ratio (SNR), it simultaneously induces severe nonlinear noise through optical fiber Kerr effects. These nonlinear impairments take two primary forms: self-phase modulation (SPM), caused by intra-channel nonlinear effects, and cross-phase modulation (XPM), resulting from inter-channel nonlinear interactions.

Digital Signal Processing (DSP) techniques have been developed to mitigate various interferences and noise in optical communication. The DSP module in the receiver addresses transmission distortions such as chromatic dispersion, polarization mode dispersion, carrier phase noise, and nonlinear Kerr effects. This paper investigates an advanced DSP architecture for nonlinear compensation, featuring a cascaded structure that combines DBP for dispersion and SPM compensation with an Adaptive Filter (ADF) for XPM mitigation.

DBP is widely recognized as one of the most effective algorithms for compensating SPM [1]. This technique compensates for both linear and nonlinear interference by solving the inverse transmission equation of light waves. However, the highly heterogeneous nature of DBP makes its implementation challenging. The number of backward steps required in high-speed, long-distance systems can be daunting [2]. While perturbation-based compensation (PBC) [3] offers a computationally simpler alternative, its nonlinearity mitigation efficiency falls significantly short of DBP. With the emergence of deep learning technologies in recent years, an increasing number of studies have leveraged data-driven methods in optical communications, where model parameters are learned directly from data to significantly reduce computational complexity. A significant amount of works have related the structure of DBP to deep neural networks[4],[5], [6],[7], considerably reducing the number of steps required for DBP. However, existing studies about learnable DBP have only considered a single system parameter setting: fixed transmission rates and WDM channel numbers. Their scalability under these system parameters is poor, requiring model retraining for different settings. For the same optical channel, its nonlinear physical properties should remain constant. Therefore, an effective learnable DBP scheme should be able to adapt to different system parameters without requiring multiple retrainings. This is one of the key motivations behind our approach.

ADF is a dynamic signal processing technique widely employed in optical communication systems, with typical applications including Frequency Offset Estimation (FOE), Carrier Phase Estimation (CPE), and Multiple-Input Multiple-Output (MIMO) processing. By dynamically adjusting filter parameters, ADF effectively mitigates noise and enhances system performance. Recent studies have shown that XPM, due to its inherent temporal correlation, can also be partially captured by adaptive filters [8],[9],[10]. Significant progress has been made in this research direction. For instance, Recursive Least Square (RLS) algorithm has been employed for XPM compensation in [11]–[13]. The work in [7] proposed using Least Mean Square (LMS) algorithm to compensate for the nonlinear phase noise that DBP fails to address. Considering the phase-dominant nature of XPM noise [14], classical phase estimation algorithms such as Blind Phase Search (BPS) and Viterbi-Viterbi (V&V) algorithm have also been investigated [15]. Nevertheless, these methods primarily repurpose existing algorithms without structural modifications tailored to XPM’s unique spatiotemporal properties, highlighting the need for specialized XPM compensation architectures.

In this paper, we propose Meta-DSP, a meta-learning-inspired DSP model composed of Meta-DBP and XPM-ADF, which is capable of concurrently handling multimodal data from optical fibers. Here, multimodal refers to the variety of optical signal data collected from the same optical fiber link, ranging from low-speed to high-speed, low-power to high-power, and with varying numbers of WDM channels. Meta-DBP is a learnable DBP module capable of automatically transferring nonlinear compensation knowledge across different system parameters to mitigate nonlinear interference, specifically SPM, within the optical channel. To achieve this, we employ a technique called a hypernetwork [16]. A hypernetwork \(f_{\phi}(z)\) is a neural network that generates the weights for another network, typically referred to as the target network \(g_{\theta}(x)\) as shown in Figure 1. By learning the mapping from task conditions to network weights \(\theta = f_{\varphi}(z)\), hypernetworks enable dynamic weight generation, model adaptability, and parameter efficiency, which are especially useful in scenarios involving diverse tasks or resource constraints. XPM-ADF is an adaptive filter designed to efficiently capture nonlinear interference between channels. It integrates a finite impulse response (FIR) filter and a single-tap phase estimator, using the Normalized Least Mean Square (NLMS) algorithm for updates. Simulation results show that this module effectively captures XPM interference.

We train Meta-DSP with supervised learning on simulated data 40 Gbaud and single channel. The properly trained Meta-DSP shows an increase of 0.55 dB in the signal’s Q factor compared to CDC on the most challenging scenario of 160 Gbaud, 21 channels. Meta-DSP also offers a tenfold reduction in computational complexity compared to DBP while achieving a similar Q factor level. The architecture of Meta-DSP holds universal significance in the algorithmic design of communication systems. It can be integrated into various stages of a comprehensive communication system, bolstering its adaptability and generalization capabilities. Our research underscores the potential of the meta-learning-based digital signal processing module across pivotal parameters in optical communication frameworks.

The rest of the paper is organized as follows: Section 2 introduces Meta-DSP and analyzes the training and testing details of the model. Section 3 introduce our simulated transmission system model. Section 4 reports the results of our numerical simulation results. Finally, Section 5 discusses future directions for our compensation algorithm.

Figure 1: Hypernetwork is a neural network that generates the weights of another network, typically referred to as the target network. By learning the mapping from task encoding to network weights, hypernetworks enable dynamic weight generation, model adaptivity, and parameter efficiency, which are particularly beneficial in scenarios with diverse tasks or resource constraints.

2 Proposed method: Meta-dsp↩︎

In this section, we propose Meta-DSP, a meta-learning based digital signal processing model. Classical Learnable DBP (LDBP) have poor adaptivity to multi-modal fiber data due to their inherent static structure. In our work, we propose improving classical DSP with insights and techniques from meta-learning. The core idea of meta-learning is training a model that can quickly adapt to new tasks [17]. In the context of optical communication, this new task involves varying transmission rates, power levels, number of channels. We aim to design a nonlinear compensation model that can learn sufficient nonlinear compensation knowledge from fiber data under a single setting. This would enable the model to perform well across a wide range of conditions, including different transmission rates, power levels, and channel configurations. Specifically, Meta-DSP consists of two modules: Meta-DBP and XPM-ADF. The whole structure of Meta-DSP is shown in Fig 2. The function of Meta-DBP is to compensate for the deterministic nonlinear effect SPM within the channel, while the function of XPM-ADF is to compensate for the stochastic nonlinear effect XPM between channels.

The first module, Meta-DBP, is a meta-learning version of LDBP. Past works on LDBP [4]–[7], [18], [19] have all adopted the approach of inserting a learnable low-pass filter into the nonlinear operator to replace the traditional nonlinear phase rotation. This method is also referred to as Filtered DBP (FDBP) in the literature. These models learn to compensate for the nonlinear effects in the channel by learning appropriate filter coefficients. However, the filter coefficients in these models are fixed, and once the transmission scenario changes, such as altering the transmission rate, the performance of these models drops sharply, necessitating retraining. To obtain an efficient DBP model that adapts to different parameter settings, we enhance the FDBP model using hypernetwork technology. We use the FDBP model as the target network and employ a hypernetwork to generate the weights in the FDBP based on task-specific information, thereby enhancing the model’s generalization capability.

Figure 2: Schematic of the Meta-DSP. The system comprises the Meta-DBP and XPM-ADF. Meta-DBP compensates for dispersion and SPM at a sample rate of 2 samples per symbol under the task-specific information. XPM-ADF capture the temporal statistical characteristics of XPM noise using a FIR filter \mathbf{w} and a phase estimator \mathbf{v}. \mathbf{u}: input signal. e: task information include symbol rate R_s, length of N-filter N_f. \hat{y}: output signal. — Figure 2: Schematic of the Meta-DSP. The system comprises the Meta-DBP and XPM-ADF. Meta-DBP compensates for dispersion and SPM at a sample rate of 2 samples per symbol under the task-specific information. XPM-ADF capture the temporal statistical characteristics of XPM noise using a FIR filter \(\mathbf{w}\) and a phase estimator \(\mathbf{v}\). \(\mathbf{u}\): input signal. \(e\): task information include symbol rate \(R_s\), length of N-filter \(N_f\). \(\hat{y}\): output signal.

The second module, XPM-ADF, is a specialized design of ADF to compensate for XPM noise. XPM noise arises from the interference of optical power from other channels on the target channel, making it a random nonlinear noise. Fortunately, previous studies [8], [9], [10] have identified temporal correlations in XPM noise. As a result, although XPM cannot be fully compensated due to information loss, its statistical properties can be effectively captured by a well-designed adaptive filter, which can help mitigate the interference caused by XPM noise. In our study, we use a FIR filter and a phase estimator to capture XPM noise. The inclusion of the phase estimator allows our model to track the phase variations of XPM noise more quickly, resulting in more accurate and stable error signals for updating the FIR filter.

In this section, we first introduce the FDBP algorithm, which is the foundation of our Meta-DBP. Then, we introduce the Meta-DBP and XPM-ADF in detail. Finally, we present the training and testing process of Meta-DSP.

Figure 3: Meta-DBP use a hyper-network to inference N-filter while FDBP get N-filter using SGD traning each time. FDBP requires collecting a new dataset for different symbol rates and training with SGD for an extended period (several minutes) to obtain the parameters. In contrast, Meta-DBP uses a hyper-network to learn the complete nonlinear compensation knowledge from 40 Gbaud data and transfers this knowledge to 80 Gbaud and 160 Gbaud. Once the hypernetwork is properly trained, the N-filter can be inferred at any symbol rate with ultra-fast speed(less than 0.1 seconds).

2.1 Filterd DBP↩︎

DBP is a algorithm based on the Nonlinear Schrödinger equation (NLSE) for single polarizarion system. \[\label{NLSE} \begin{align} &\frac{\partial u(z, t)}{\partial z}=-\frac{\alpha}{2}u(z,t) + \frac{i\beta_2}{2} \frac{\partial^2 u(z,t)}{\partial t^2} \\ & - i \gamma \left|u(z, t)\right|^2 u(z, t) + n(z,t). \end{align}\tag{1}\] While the dual-polarization form, commonly referred to as the Manakov equation, is given as follows: \[\begin{align} &\frac{\partial u_{x/y}(z, t)}{\partial z}=-\frac{\alpha}{2}u_{x/y}(z,t) + \frac{i\beta_2}{2} \frac{\partial^2 u_{x/y}(z,t)}{\partial t^2} \\ & - \frac{i 8\gamma}{9} \left(\left|u_{x}(z, t)\right|^2 + \left|u_{y}(z, t)\right|^2 \right)u_{x/y}(z, t). \end{align}\] Here, \(u(z,t)\) represents the optical pulse in a fiber at position \(z\) and time \(t\), and \(n(z, t)\) denotes the distributed amplified spontaneous emission (ASE) noise from inline amplifiers. The noise in 1 can be modeled as additive white Gaussian noise (AWGN) with zero mean and zero auto-correlation. The parameters \(\alpha\), \(\beta_2\), and \(\gamma\) represent attenuation, group velocity dispersion, and the fiber nonlinearity coefficient, respectively. The electric field of a signal at the transmitter is represented by \(u(0,t)\), while \(u(L,t)\) denotes the received signal at the receiver, neglecting the carrier frequency offset and carrier noise.

To compensate for distortions induced by fiber propagation, the DBP method aims to reverse the propagation process by performing the inverse of the split-step Fourier method. It iteratively processes the signal through multiple backpropagation steps, each consisting of a dispersion compensation module and a nonlinear phase correction module. Dispersion compensation module can be implemented in the time domain use convolution operation \(\mathbf{u} * D-filter\), the dispersion kernel "D-filter" is given as: \[\begin{align} \label{D-eq-time-dis} & D-filter = ifft\left(\exp\left(-\frac{i\beta_2 \omega^2}{2} \cdot dz\right)\right) \nonumber \\ & \omega = 2\pi F_s \cdot \frac{1}{N_d} \left[0,1,\ldots,-\frac{N_d+1}{2},-\frac{N_d-1}{2},\ldots,-1\right], \end{align}\tag{2}\] Where \(F_s = R_s \times SpS\) is the sample rate of signal, \(SpS\) is samples per symbol, and \(dz\) is the step size of the backpropagation, \(N_d\) denotes the length of the D-kernel and should not be less than the sample distance affected by dispersion. An empirical expression for this has been provided in numerous literature [7]. \[N_d = [2 \pi dz \beta_2 F_s^2].\] Nonlinear compensation module is implemented as a phase rotation operation: \[\label{N-eq-dis} \exp(-i\gamma dz |\mathbf{u}|^2) \mathbf{u},\tag{3}\] To enhance DBP’s performance, FDBP has been proposed, which assumes a low-pass filter for the signal power to achieve an improved phase rotation at the nonlinear step [20], [21], [7], [5]. FDBP expands the nonlinear phase rotation in 3 as follows: \[\label{FDBP} \exp(i\gamma dz |\mathbf{u}|^2 * N-filter) \mathbf{u},\tag{4}\] Where \(N-filter\) refers to the model parameters that can be optimized during the model’s training phase, and \(*\) in 4 denotes a central convolution. In the case of single polarization, the N-filter is a convolution kernel with shape \([N_f]\). For dual polarization, the N-filter becomes a convolution kernel of shape \([2, 2, N_f]\), represented as \(N-filter = [[N_{xx}, N_{xy}], [N_{yx}, N_{yy}]]\). Here, the \(*\) denotes a convolution operation where both the input and output channels are 2. It has been validated in the literature that FDBP can reduce the computational cost of DBP by an order of magnitude.

2.2 Meta-DBP↩︎

Meta-DBP uses a hypernetwork \(f_{\varphi}\) to generate the N-filter for nonlinear phase rotation in FDBP, as shown in Figure 3. Unlike FDBP, which obtains the N-filter through stochastic gradient descent on different modal datasets, Meta-DBP directly infers the N-filter using a hypernetwork \(f_{\varphi}\) based on task-specific information (symbol rate \(R_s\)), making the process more direct and efficient. To enable the hypernetwork \(f_{\varphi}\) to effectively transfer nonlinear compensation knowledge across different modalities, we investigate the relationships between N-filters in different modalities through both theoretical derivation and simulation experiments. We find that N-filters exhibit parallel transferability across different WDM channel counts and power levels. Additionally, N-filters under varying transmission baud rates are approximately related by a scaling transformation. Finally, we incorporate these findings into the design of our hypernetwork.

Figure 4: Meta-DBP layer. Meta-DBP use a Multi-layer Perceptrons (MLP) to inference N-filter for nonlinear phase rotation, while FDBP regard N-filter as a static parameter. * is convolution operation, \cdot is point-wise multiplication. — Figure 4: Meta-DBP layer. Meta-DBP use a Multi-layer Perceptrons (MLP) to inference N-filter for nonlinear phase rotation, while FDBP regard N-filter as a static parameter. \(*\) is convolution operation, \(\cdot\) is point-wise multiplication.

We first analyze the relationship between the N-filter and task-specific information from a theoretical perspective. In simple terms, the N-filter is essentially equivalent to the nonlinear coefficient \(C_{m,0}\) of the intral-channel XPM term in the Perturbation Based Compensation (PBC)[22]. To better illustrate this relationship, we first review the PBC: \[\begin{align} a_{p,x} + i\sum_{m,n} C_{m,n} \left( a_{p+m,x} a_{p+m+n,x}^* + a_{p+m,y} a_{p+m+n,y}^* \right) a_{p+n,x}, \nonumber \\ a_{p,y} + i\sum_{m,n} C_{m,n} \left( a_{p+m,x} a_{p+m+n,x}^* + a_{p+m,y} a_{p+m+n,y}^* \right) a_{p+n,y}, \end{align}\] where \(a_{p,x}\) and \(a_{p,y}\) represent the received symbols in the two polarization directions, \(C_{m,n}\) is the nonlinear coefficient which can be approximated by the following formula: \[\begin{align} C_{m,n} &= \frac{1}{T} \int_0^z dz \, \gamma f(z) \int dt \, \Big[ g^*(z,t) g(z,t-mT) \nonumber \\ & \quad \times g(z,t-nT) g^*(z,t-(m+n)T) \Big], \end{align}\] where \(g(z,t)\) is the pulse function after dispersion evolution over distance \(z\). If we retain only the terms about intral-channel XPM term on the right-hand side, the PBC method can be simplified as: \[\begin{align} a_{p,x} + i\sum_{m \neq 0} C_{m,0} \left( 2|a_{p+m,x}|^2 + |a_{p+m,y}|^2 \right) a_{p,x} \nonumber \\ a_{p,y} + i\sum_{m \neq 0} C_{m,0} \left( 2|a_{p+m,y}|^2 + |a_{p+m,x}|^2 \right) a_{p,y}. \end{align}\] These terms can be more accurately modeled using phase rotation [23]: \[\begin{align} \label{PBC-DBP} a_{p,x} \exp\left( i\sum_{m\neq 0} C_{m,0} \left( 2|a_{p+m,x}|^2 + |a_{p+m,y}|^2 \right) \right) a_{p,x} \nonumber \\ a_{p,y} \exp\left( i\sum_{m\neq 0} C_{m,0} \left( 2|a_{p+m,y}|^2 + |a_{p+m,x}|^2 \right) \right) a_{p,y}. \end{align}\tag{5}\]

As we can see, 5 is equivalent to the nonlinear step in FDBP. The required N-filter is, in fact, equivalent to \(N_{xx} = N_{yy} = 2N_{xy} = 2N_{yx} = 2[C_{-m,0}, C_{-m+1,0}, \dots, C_{m-1,0}, C_{m,0}]\). With Gaussian pulse assumption, the PBC coefficient related to intral-channel XPM is real numbers which can be expressed as [3]: \[\label{iXPM95coeff} C_{m, 0}= \frac{8}{9} \frac{\gamma \tau^2}{\sqrt{3}\left|\beta_2\right|} \frac{1}{2} E_1\left(\frac{m^2 T^2 \tau^2}{3\left|\beta_2\right|^2 L^2}\right),\tag{6}\] where \(L\) is transmission distance, \(T\) is symbol time (also inverse of the symbol rate), \(\tau\) is pulse width, \(\beta_2\) is dispersion coefficient, \(E_1\) is exponential integral function. We find that this relationship reveals the connection between N-filters at different symbol rates. Since pulse width \(\tau \approx T = \frac{1}{R_s}\), \[C_{m,0} \approx T^2 f(T^2 m), \;f(x) = \frac{8}{9} \frac{i\gamma}{\sqrt{3}\left|\beta_2\right|} \frac{1}{2} E_1\left(\frac{x^2}{3\left|\beta_2\right|^2 L^2}\right).\] From this expression, we find that \(C_{m,0}\) is a symmetric real kernel which fundamentally dependent only on the transmission baud rate \(R_s = \frac{1}{T}\), and is independent of the WDM channel count \(N_{ch}\) and signal power \(P_{ch}\). Moreover, when \(R_s\) changes, \(C_{m,0}\) can be interpreted as a scaling transformation of the same functional form, with the scaling factor being \(T^2\).

In fact, the waveform we use is not exactly Gaussian pulse, and in practical applications, many parameters in the system cannot be accurately estimated. Additionally, there are other amplifier noises and nonlinear effects in the system. Therefore, the expression of \(f(x)\) is not precise. Consequently, we choose to use a neural network \(f_{\varphi}\) to learn this function. For single-polarization signals, \(f_{\varphi}\) is a function mapping from \(R^1\) to \(R^1\). For dual-polarization signals, \(f_{\varphi}\) maps from \(R^1\) to a \(R^4\) output represent \([N_{xx}, N_{xy}, N_{yx}, N_{yy}]\), as shown in the Fig 5. The N-filter is given as: \[\label{N-filter} N-filter =T^2 f_{\varphi}(T^2 |m|).\tag{7}\] Note that in 7 , we employ symmetric positional encoding \(|m|\) to apply \(f_{\varphi}\) element-wise for generating the N-filter: \[|m| = \left[\frac{N_f-1}{2}, \frac{N_f-3}{2}, \ldots, -1, 0, 1, \ldots, \frac{N_f-1}{2}\right].\] This naturally introduces symmetry into the N-filter and effectively reduces the number of model parameters.

Figure 5: The N-filter generated by the hyper-network f_{\varphi} for single-polarizarion or dual-polarization signals. — Figure 5: The N-filter generated by the hyper-network \(f_{\varphi}\) for single-polarizarion or dual-polarization signals.

In the Meta-DBP layer, see Fig 4, the input includes not only the signal \(u_j\) but also a task-specific information \(e=(R_s, N_f)\) including symbol rate \(R_s\) and the length of the nonlinear filter \(N_f\). Despite our dataset simulating different WDM channel counts \(N_{ch}\), different signal powers \(P_{ch}\), and different transmission baud rates \(R_s\), our comparative simulations revealed that FDBP generalizes well over \(N_{ch}\) and \(P_{ch}\) but poorly over \(R_s\). Therefore, we chose the symbol rate \(R_s\) as the task-specific information. Additionally, due to the significant impact of the nonlinear filter length \(N_f\) on the model’s performance [5] [7], we also included \(N_f\) as task-specific information for the model. The optimal value of \(N_f\) is determined by increasing its value until the performance converges, balancing performance gains and computational efficiency (in our experiments, we set \(N_f = 401\) for DBP step=5, \(N_f=201\) for DBP step=25). The algorithm of Meta-DBP can be roughly divided into two steps. The first step is to generate the D-filter and N-filter using the task-specific information \(e\), which are used to compensate for dispersion and nonlinear phase rotation, respectively. The second step is to perform dispersion compensation and nonlinear compensation using the D-filter and N-filter, respectively, as shown in Figure 4. The generation of the D-filter is done using Equation 2 . The N-filter is generated by inputting the position encoding plus the scale transformation of the symbol time squared into a neural network, as shown in Equation 7 .

2.3 XPM-ADF↩︎

In this subsection, we present the methodology for selecting an appropriate adaptive filter to mitigate XPM. Previous studies have thoroughly investigated XPM-induced nonlinear interference [9]–[11], [14]. Building upon these works, XPM noise can be effectively characterized as a time-varying Inter-Symbol Interference (ISI) process [8], [10], [24]. The mathematical formulation of the XPM noise model is given by: \[\label{ch2:XPM95model} \Delta a^{\text{XPM}}_{n} = \sum_{l} H_l^{(n)} a_{n+l}\tag{8}\] where \(a_n = [a_{n,x}, a_{n,y}]^T\) denotes the transmitted symbol vector, and \(H_l^{(n)}\) represents time-varying \(2\times 2\) coefficient matrices. It is worth noting that accurate computation of \(H_l^{(n)}\) requires complete knowledge of the interfering channel’s transmitted symbols [8], which is generally unavailable in practical systems. However, empirical studies have revealed that \(H_l^{(n)}\) exhibits specific statistical properties [14], with the zeroth-order term \(H_{0}^{(n)}\) dominating the XPM effect as phase noise.

To address the phase-dominant characteristic of XPM noise, we propose the XPM-ADF, which combines a FIR with a single-tap phase estimator. While architecturally similar to [25], our design specifically targets XPM compensation. The phase estimator rapidly tracks the dominant XPM phase noise (\(l = 0\)), while the FIR filter captures higher-order XPM effects (\(l>0\)) using oversampled data. FIR filter and phase estimator are jointly updated via the Normalized Least Mean Squares (NLMS) algorithm. The XPM-ADF operates as follows as shown in Fig 2. First, the output signal is computed: \[\begin{align} & \mathbf{u'}[n] = \mathbf{w}[n] \mathbf{u}_n= \left[\begin{array}{ll} \mathbf{w}_{x x}[n] & \mathbf{w}_{x y}[n] \\ \mathbf{w}_{y x}[n] & \mathbf{w}_{y y}[n] \end{array}\right]\left[\begin{array}{l} \mathbf{u}_{x}[n] \\ \mathbf{u}_{y}[n] \end{array}\right] \\ & \hat{\mathbf{y}}[n] = \mathbf{v}[n] \circ \mathbf{u'}[n] = \left[\begin{array}{l} v_x[n] \cdot \mathbf{u'}_{x}[n] \\ v_y[n] \cdot \mathbf{u'}_{y}[n] \end{array}\right], \\ & \mathbf{y}_D[n] = \mathcal{D}(\hat{\mathbf{y}}[n]) \end{align}\] where \(\mathbf{w}[n] \in \mathbb{C}^{2\times 2\times d}\) is the FIR filter, \(\mathbf{v}[n] \in \mathbb{C}^2\) is the phase estimator, \(\mathbf{u}[n] \in \mathbb{C}^{2\times d}\) contains the input samples, \(d\) is the filter length, \(\circ\) is element-wise multiplication. \(\mathcal{D}\) is the hard decision for 16QAM. Then, the phase estimator can be updated as follows: \[\begin{align} & \mathbf{e}_v[n] = \mathbf{y}_D[n] - \hat{\mathbf{y}}[n] \\ & \mathbf{v}[n+1] = \mathbf{v}[n] + \frac{\mu_v}{|\mathbf{u'}[n]|^2 + \epsilon} \mathbf{e}_v[n] \mathbf{u'}[n]^* \end{align}\] For the FIR filter, the error signal and update rule are: \[\begin{align} & \mathbf{e}_w[n] = \mathbf{y}_D[n] \frac{|\mathbf{v}[n]|}{v[n]} - \mathbf{u'}[n] \\ & \mathbf{w}[n+1] = \mathbf{w}[n] + \frac{\mu_w}{|\mathbf{u}[n]|^2 + \epsilon} \mathbf{e}_w[n] \mathbf{u}[n]^H \end{align}\] where \(\mu_v, \mu_w\) us the step size for phase estimator and FIR filter, \(\epsilon\) are constants to avoid division by zero.

2.4 Training phase↩︎

In this subsection, we present the training scheme for Meta-DBP. The structure of Meta-DSP includes two modules: Meta-DBP and XPM-ADF. The former is a highly parallelizable convolutional structure with many trainable parameters, while the latter is a recursive filter, a typical serial computation structure with no trainable parameters. One option is to treat Meta-DBP and XPM-ADF as a single entity and perform end-to-end training, as done in [5]. However, in our attempts, we found that this training method is not stable. The training results are sensitive to hyperparameters such as batch size, and it is difficult to fully utilize the highly parallel computing capabilities of the GPU. Ultimately, we chose to train the Meta-DBP module separately, as shown in Figure 6. To ensure consistency between the model’s performance during the training and testing phases, we used a static convolutional layer combined with an direct phase estimator to replace XPM-ADF. This approach is similar to the one proposed by [18].

During the training stage, the input optical signal samples \(\mathbf{u}\) are at a rate of 2 samples per symbol with shape \([B, 2W, N_{modes}]\). \(B\) is the batch size, \(W\) is the truncated signal length, \(N_{modes}\) is the number of polarization modes of the optical signal (1 or 2). The task information \(e\) has a shape of \([B, 2]\), where the two dimensions represent the symbol rate \(R_s\) and the nonlinear filter length \(N_f\). The shape of the output signal from Meta-DBP is \([B, 2W - 2O, N_{modes}]\), where \(2O\) is the length reduction caused by the valid convolution operation in Meta-DBP. At this stage, the signal still maintains a rate of 2 samples per symbol. The static convolutional layer then downsamples the output signal from Meta-DBP to 1 sample per symbol, denoted as \(\hat{y}\) with shape \([B, W-O, N_{modes}]\). The predicted signal \(\hat{y}\) is then compared to the true signal \(y\) from the transmitter for loss function calculation: \[loss(\hat{y}, y) = \left|\hat{y} \exp(-i\delta) - y\right|^2, \delta = \angle \left(\hat{y}^H y\right).\] Where \(\delta\) is the average phase error with shape \([B, 1, N_{modes}]\), the dot operation between \(\hat{y}^H, y\) is carried on the time axis. Finally, the loss function is utilized to perform backpropagation to update the parameters of the hypernetwork \(f_{\varphi}\). During the testing phase, we should replace the static convolutional layer with the XPM-ADF and remove the direct phase estimator.

Finally, we outline the details of our training process. The hypernetwork MLP used a three hidden layers fully connected network with neurons in each layer set to [1, 200, 50, 2, 4], and ReLU was used as the activation function. The input dimension is 1, representing any element of the position code \(|m|\). The output dimension is 4, corresponding to 4 N-filters in the dual-polarization case. Hidden layers are configured as 200, 50, and 2 units. The 200 and 50-unit layers enhance representation capacity, while the 2-unit layer exploits the symmetry of the 4 N-filters, allowing them to be represented by 2 parameters. Each dual polarization signal contained \(5 \times 10^5\) symbols. During the training of FDBP and Meta-DSP, we used 40 Gbaud, 1-channel signals with four different energy levels. For testing, each mode used \(1 \times 10^5\) symbols. We employed the Adam optimizer with a learning rate of 0.003 for Meta-DBP and learing rate 0.03 for Conv layer, a batch size of 10, and a signal truncation length of 2000, training for 20 epochs.

Figure 6: Schematic of the Meta-DSP training phase. In traning phase, we employs a 1D convolutional layer to substitute XPM-ADF and incorporates a custom loss function combining a average phase estimator with MSE, which we empirically demonstrate to yield more stable and faster convergence compared to end-to-end training. The hypernetwork f_{\varphi} and convolutional layer are optimized via backpropagation with learning rates of 0.003 and 0.03, respectively. — Figure 6: Schematic of the Meta-DSP training phase. In traning phase, we employs a 1D convolutional layer to substitute XPM-ADF and incorporates a custom loss function combining a average phase estimator with MSE, which we empirically demonstrate to yield more stable and faster convergence compared to end-to-end training. The hypernetwork \(f_{\varphi}\) and convolutional layer are optimized via backpropagation with learning rates of 0.003 and 0.03, respectively.

2.5 Complexity analysis↩︎

In this subsection, we investigate the computational complexity of all models. We adopt the real multiplications per symbol (RMPS) metric, a widely used measure in optical communication systems to quantify computational load during the design phase. We focus on several baseline models: CDC, DBP, and FDBP. To mitigate residual phase errors, a XPM-ADF is incorporated into each of these models. In the XPM-ADF, the filter function \(h\) is computed, and the gradient \(g_k\) is obtained. This process requires nearly \(2(\text{taps}+1)\) complex multiplications. Thus, the RMPS of XPM-ADF can be expressed as \[\label{CC95DDLMS} C_{\text{XPM-ADF}} = 4\times 2 (\text{taps}+1),\tag{9}\] where the factor of 4 signifies that one complex multiplication can be represented through four real multiplications.

CDC stands out as a widely adopted commercial technology. It solely employs a filtering operation with a filter length \(N_d\), resulting in the lowest RMPS [26]: \[\label{EDC95CC95formula} C_{\text{CDC}} = \min_{N} \frac{8N(\log_2(N) + 1)}{N - N_{d} + 1},\tag{10}\] This formula originates from the overlap and save method applied when utilizing FFT for convolution. The FFT size is denoted by \(N\), and we have optimized this size to minimize computational complexity.

For DBP, its complexity is largely influenced by the steps per span \(N_{stps}\). A higher \(N_{stps}\) value indicates improved compensation and increased RMPS. The DBP requires \(N_{span}N_{stps}\) convolution operations, where each filter has a length of approximately \(N_d/(N_{span}N_{stps})\). In addition to this, performing non-linear rotations also requires two complex multiplications (SpS = 2). Therefore, the RMPS for DBP can be articulated as: \[\label{DBP95CC95formula} \begin{align} C_{\text{DBP}} = \min_{N} 4N_{span}N_{stps}\left(\frac{2N(\log _2 N+1)}{N-N_d/(N_{span}N_{stps})+1}+2\right), \end{align}\tag{11}\] where \(N_{span}\) denotes the total number of spans and \(N_{stps}\) indicates the propagation steps for each span.

The FDBP algorithm, outlined in section 2, is configured in our study with a back-propagation step count of 5 (\(N_{stps}=0.2, N_{span}=25\)), aligning with the non-linear kernel length \(N_f=401\) in the Meta-DSP model. The RMPS of FDBP adds a convolution cost with a filter of length \(N_f\) to the DBP method at each step: \[\label{FDBP95CC95formula} \begin{align} & C_{\text{FDBP}} = \min_{N} 4N_{span}N_{stps} \cdot \\ & \left(\frac{2N(\log _2 N+1)}{N-N_d/(N_{span}N_{stps})+1} + \frac{2N(\log _2 N+1)}{N-N_f +1} \right), \end{align}\tag{12}\]

The architecture of Meta-DBP parallels that of the FDBP model. The only added computation stems from the hyper-network \(f_{\varphi}\). As this computation is executed only once and its load is distributed across all transmitted symbols, its impact is minimal. Thus, the RMPS of Meta-DSP matches that of FDBP: \[\label{Meta95DBP95formula} \begin{align} & C_{\text{Meta-DBP}} = \min_{N} 4N_{span}N_{stps}\cdot \\ & \left(\frac{2N(\log _2 N+1)}{N-N_d/(N_{span}N_{stps})+1} + \frac{2N(\log _2 N+1)}{N-N_f +1} \right). \end{align}\tag{13}\]

3 Transmission system model↩︎

In this section, we introduce our simulated transmission system model. The simulated transmission link is depicted in Fig. 7. The transmitter multiplexes \(N_{ch}=2M+1\) WDM channels, and the transmission channel consists of 25 spans of Standard Single-Mode Fiber (SSMF), each followed by an EDFA to amplify the optical signal, introducing amplified spontaneous emission (ASE). At the receiver end, a demultiplexer splits the optical signal into \(2M+1\) channels, which then enter their respective coherent receivers for sampling. Finally, DSP is performed separately for each signal.

Figure 7: Scheme of simulated transmission link. N_{ch}=2M+1 WDM channels are multiplexed (i.e., the MUX module) in transmitter and transmission in fiber link. The transmission channel is composed of 25 segments of Standard Single-Mode Fiber (SSMF), each followed by an EDFA to amplify the optical signal, which simultaneously introduces ASE. At the receiving end, a demultiplexer (i.e., the DEMUX module) is used to split the optical signal into 2M+1 channel signals, which then enter their respective coherent receivers for sampling. Finally, DSP is performed separately for each signal. — Figure 7: Scheme of simulated transmission link. \(N_{ch}=2M+1\) WDM channels are multiplexed (i.e., the MUX module) in transmitter and transmission in fiber link. The transmission channel is composed of 25 segments of Standard Single-Mode Fiber (SSMF), each followed by an EDFA to amplify the optical signal, which simultaneously introduces ASE. At the receiving end, a demultiplexer (i.e., the DEMUX module) is used to split the optical signal into \(2M+1\) channel signals, which then enter their respective coherent receivers for sampling. Finally, DSP is performed separately for each signal.

3.1 Transmitter and Optical Fiber Link↩︎

We studied a point-to-point WDM system employing dual-polarization 16QAM signals modulated with a raised-cosine function (roll-off factor = 0.1). To comprehensively evaluate the model’s generalization capability across diverse scenarios, we generated multiple transmission configurations by varying three key parameters: WDM channel counts (\(N_{ch}\)) of 1, 3, 5, and 21 channels to examine XPM noise intensity effects; baud rates (\(R_s\)) of 40GBaud, 80GBaud, and 160GBaud, covering current and future high-speed systems; and channel power levels (\(P_{ch}\)) ranging from -8 to 8 dBm to assess nonlinearity impacts. The channel spacing was fixed at 0.2 times the symbol rate with a central wavelength of \(\lambda\) = 1550 nm. The transmitted signal can be described as: \[\label{WDM32time32domain} u(0,t) = \sum_{k=-M}^{M} A_k(0,t) \exp\left(i (\omega_k - \omega_0) t\right),\tag{14}\] where \(\omega_k\) represents the frequency of the \(k\)th channel. Each channel’s symbols are represented as: \[\label{Each32channel32symbols} A_k(0,t) = \sqrt{P_k}\sum_{n=1}^{N} x^k_n g(t - nT),\tag{15}\] where \(P_k=P_{ch}\) for all k denotes the launch power of the \(k\)th channel, N is number of symbols, and \(x_n^k \in \mathcal{X}\) is a complex symbol in the IQ plane, with \(\mathcal{X}\) representing the 16QAM modulation format constellation. The pulse shape function \(g(t)\) is typically a raised-cosine function with a roll-off parameter of 0.1. For dual-polarization signals, \(u\) and \(A_k\) become vector signals with two dimensions representing the x and y polarization directions, retaining the same form as equations 14 and 15 .

The generated signal is then transmitted through a simulated fiber link composed of 25 spans of 80km single-mode fiber, totaling 2000km. A standard EDFA with a 4.5 dB noise figure compensates for the losses of each span. Signal propagation is modeled by the NLSE in the single-polarization system and by the Manakov equations in the dual-polarization system. These equations are solved using a standard second-order symmetrical split-step Fourier method. All system parameters are summarized in Table 1.

Table 1: Transmission model parameters
	0.2 dB/km
	16.5 ps/nm/km
	1.31 /W/km
	\(25 \times 80\) km
	4.5 dB
	16QAM
	1550 \(\mathrm{~nm}\)
	0.1
	160/80/40 \(\mathrm{GBaud}\)
	21/11/9/7/5/3/1

3.2 Receiver and DSP Model↩︎

The model integrates a total of \(N_{ch}\) modulated optical signals at the transmitter end, which are then propagated through the optical fiber. At the receiver end, these signals are de-multiplexed into \(N_{ch}\) individual channels. Each channel signal is independently received and sampled at 2 samples per symbol. The sampled electrical signals then undergo DSP, the DSP workflow consists of two main stages: DBP and ADF. DBP compensates for both linear and nonlinear effects by solving the \(z\)-reversed NLSE in the digital domain, while the adaptive filter corrects the phase error caused by XPM and downsamples the signal to 1 sample per symbol.

Since the focus of this paper is on compensating for nonlinear noise in optical communication systems, we make the following simplifications when simulating the receiver: we do not introduce additional receiver noise and frequency offset, thus eliminating the need for a frequency offset estimator (FOE) and carrier phase estimator (CPE). All compensation simulations are conducted on the central channel, ensuring natural alignment of transmitted and received symbols, and obviating the need for a clock recovery module.

For comparison purposes, we also consider nonlinear equalizers based on the DBP method. After downsampling to SpS=2, Split-Step Fourier Method (SSFM) based DBP is applied to the central channel for SPM compensation. It should be noted that for the DBP method, we numerically optimized the nonlinear parameter, as its value depends on the dispersion coefficient, the number of propagation steps, and the launch power. Next, we compensate for the remaining nonlinear phase shift of all symbols using XPM-ADF. After nonlinear equalization, we apply demodulation and calculate the bit error rate (BER) for the central channel of interest (COI).

4 Numerical Results↩︎

In this section, we validate the effectiveness of Meta-DSP through numerical results. First, we explore the performance of XPM-ADF, showing that it captures XPM noise more effectively than other adaptive filters. Consequently, XPM-ADF is used in all subsequent simulations. Next, we conduct a series of simulations to test the generalization performance of FDBP. Our results demonstrate that the FDBP+XPM-ADF structure generalizes well across two system parameters: the number of WDM channels (\(N_{ch}\)) and the power per channel (\(P_{ch}\)). However, we find that FDBP struggles to generalize when the transmission baud rate (\(R_s\)) changes, which motivates our Meta-DBP design. Finally, we present numerical simulations that highlight the superior generalization performance of Meta-DSP.

4.1 Comparison simulations of XPM-ADF↩︎

This subsection evaluates the XPM compensation performance of six distinct filtering approaches: BPS, DD-LMS, DD-RLS, V&V, our proposed XPM-ADF, and a static filter as baseline. We conducted numerical simulations on dual-polarization 80 Gbaud systems with 21 channels at 2 dBm launch power (near system optimum). The optical signals first undergo digital backpropagation (DBP) with 64 steps per span for SPM compensation, followed by XPM compensation using the selected filter module.

All algorithms were carefully optimized through detailed parameter tuning. For BPS [27], we found the best performance with a 40-symbol window length and 161 phase candidates, as further increases did not improve the Q-factor. The V&V algorithm, implemented as in [28], worked best with a 121-symbol window length. Using a thorough grid search, we identified the optimal settings for other methods: DD-LMS performed best with a 32-tap filter and step size \(\mu=3\times10^{-4}\), while DD-RLS used an 8-tap filter (to limit complexity) with a forgetting factor of 0.996 and \(\delta=0.01\). Our XPM-ADF used a 32-tap filter with \(\mu_v=\mu_w=2^{-6}\), and a static 32-tap filter served as a reference for dynamic noise.

Figure 8 compares Q-factor and runtime of five algorithms implemented on a single CPU. Among them, XPM-ADF achieves the highest Q-factor improvement while maintaining relatively low computational latency, demonstrating its superior efficiency. DD-LMS and DD-RLS follow in performance, though DD-RLS suffers from computational complexity constraints that limit its filter length, resulting in inferior XPM compensation capability compared to DD-LMS. The remaining algorithms—Static Filter, BPS, and V&V—exhibit notably poor performance. The Static Filter’s weak results indicate strong dynamic noise in the signal, while the inadequacy of BPS and V&V suggests that phase-only modeling is insufficient for effective XPM noise compensation.

Figure 8: Performance comparison of various compensation methods (80 Gbaud, 21 channels, dual-polarization). XPM-ADF demonstrates optimal balance between Q-factor gain and computational cost.

Figure 9 demonstrates that DBP corrects the optical signal to a regular shape, leaving only residual phase errors caused by XPM noise. By comparing DBP-processed symbols with the transmitted symbols, we extract the XPM noise, as illustrated in the lower left plot. The XPM phase noise exhibits strong temporal correlation with slow dynamics, making it trackable by the ADF. The lower left plot of Figure 9 further shows that the phase estimator \(v[n]\) successfully tracks the XPM phase noise over time, enabling XPM-ADF to outperform other algorithms in terms of Q factor.

To validate the effectiveness of XPM-ADF in capturing XPM noise, we conducted additional tests using a single-channel dataset, where XPM noise is inherently absent. The results of this evaluation are presented in Figure 10. In the single-channel scenario, the XPM noise level remains negligible, indicating minimal interference. In contrast, the phase noise observed in the 21-channel case (as shown in Fig. 9) exhibits significant random fluctuations over time, highlighting the presence of XPM-induced disturbances. The Q-factor analysis further supports these findings. In the single-channel case, the performance of XPM-ADF, DD-LMS, and the Static filter algorithms is nearly identical, as expected due to the absence of XPM noise. However, in the 21-channel scenario (Fig. 10), the Q-factor achieved by XPM-ADF is significantly higher than that of the other two algorithms. This improvement confirms the ability of XPM-ADF to effectively capture and mitigate XPM noise, thereby enhancing system performance in multi-channel environments.

Figure 9: XPM compensation (80 Gbaud, 21 channels, dual polarization). The upper plot shows the DSP pipeline. We design a DBP module with 64 steps per span to compensate for SPM, followed by different adaptive filters to compensate for XPM. The lower left plot shows the XPM phase noise variation over time and the phase estimator \mathbf{v}[n] tracking results of XPM-ADF under WDM 21 channels. By plotting the XPM phase noise and the phase estssimator of XPM-ADF, we can observe that the XPM noise can be effectively tracked by the phase estimator of XPM-ADF under WDM 21 channels. The lower right plot compares the performance of BPS, DD-LMS, XPM-ADF, static filter in terms of Q factor. It is evident that XPM-ADF outperforms the other algorithms in capturing XPM noise. — Figure 9: XPM compensation (80 Gbaud, 21 channels, dual polarization). The upper plot shows the DSP pipeline. We design a DBP module with 64 steps per span to compensate for SPM, followed by different adaptive filters to compensate for XPM. The lower left plot shows the XPM phase noise variation over time and the phase estimator \(\mathbf{v}[n]\) tracking results of XPM-ADF under WDM 21 channels. By plotting the XPM phase noise and the phase estssimator of XPM-ADF, we can observe that the XPM noise can be effectively tracked by the phase estimator of XPM-ADF under WDM 21 channels. The lower right plot compares the performance of BPS, DD-LMS, XPM-ADF, static filter in terms of Q factor. It is evident that XPM-ADF outperforms the other algorithms in capturing XPM noise.

Figure 10: XPM compensation (80 Gbaud, 1 channel, dual polarization). In the single-channel signal case, without XPM noise, the phase tracker in XPM-ADF shows almost no dynamic change over time. However, in the case of 21 channels, the phase noise tracked by XPM-ADF becomes clearly noticeable. The Q-factor results for XPM-ADF, DD-LMS, and static filter are almost identical in the single-channel case, but show significant differences in the 21-channel case (Fig 9). This indicates that XPM-ADF is effective in capturing XPM noise.

4.2 Transferability of the FDBP Model↩︎

In this subsection, we test the transferability of the FDBP model on multimodal optical fiber data by revisiting our data generation process. The optical fiber signal data has three important system parameters: the number of WDM channels \(N_{ch}\), the symbol rate \(R_s\), and the average power per channel \(P_{ch}\). To test the model’s ability to transfer across different modes, we designed three sets of simulations using a controlled variable approach. In each setting, we fix two parameters and vary the other to create different training datasets for comparison. The performance of models trained on different training sets is then tested on a unified test set as shown in Table 2.

Table 2: Simulation setup for testing the transferability of the FDBP model.
Simulation	Training Set	Test Set
1	Dual polarization 16QAM. Rs=80 Gbaud, Nch=21, Pch=-2 or 2 or 6	Dual polarization 16QAM. Rs=80 Gbaud, Nch=21, Pch=-3,-2,…,6
2	Dual polarization 16QAM. Rs=80 Gbaud, Pch=[-2,0,2,4], Nch=1 or 5 or 21	Dual polarization 16QAM. Rs=80 Gbaud, Nch=21, Pch=-3,-2,…,6
3	Dual polarization 16QAM. Nch=21, Pch=[-2,0,2,4], Rs=40,80, 160 Gbaud	Dual polarization 16QAM. Rs=80 Gbaud, Nch=21, Pch=-3,-2,…,6

Figure 11: Learned N-filters for dual polarization. The figure above shows the four results of the N-filter for dual polarization: \(N_{xx}, N_{xy}, N_{yx}, N_{yy}\). The figure below shows the values of \(N_{xx}\) and \(N_{yy}\) divided by 2, revealing significant overlap. This leads to the conclusion that \(N_{xx} = N_{yy} = 2N_{xy} = 2N_{yx}\)..

Figure 12: FDBP N-filter comparison for different system parameters: The N-filter can be transferred across different N_{ch} and P_{ch} values, but a scaling transformation is needed for different R_s. For example, by compressing the N-filter learned at 80 Gbaud along the x-axis by 4 times and stretching it along the y-axis by 4 times, it closely matches the N-filter learned at 40 Gbaud. — Figure 12: FDBP N-filter comparison for different system parameters: The N-filter can be transferred across different \(N_{ch}\) and \(P_{ch}\) values, but a scaling transformation is needed for different \(R_s\). For example, by compressing the N-filter learned at 80 Gbaud along the x-axis by 4 times and stretching it along the y-axis by 4 times, it closely matches the N-filter learned at 40 Gbaud.

We first present a set of dual-polarization N-filter results in Fig. 11. The first plot shows the N-filter learned at 80 Gbaud with single channel. Here, \(N_{xx}\), \(N_{xy}\), \(N_{yx}\), and \(N_{yy}\) represent 4 N-filters of the interaction between the two polarization states. From the figure, it can be observed that the data-driven FDBP learns an approximate relationship: \(N_{xx} = N_{yy} = 2N_{xy} = 2N_{yx}\). This result aligns with the theoretical analysis presented in Section 2.B. For simplicity, in the following figures comparing the three simulations, we will only show the results for \(N_{xx}\).

In Fig. 12, we present N-filter (\(N_{xx}\)) for Simulations 1 to 3. The data reveal that the N-filter demonstrates strong consistency with variations in \(N_{ch}\) and \(P_{ch}\), but exhibits distinct shapes when the transmission rate \(R_{s}\) changes. In the bottom panel of Fig. 12, we observe that the N-filters for different transmission rates appear to follow the same functional form, differing only by a scale factor. This observation aligns with the analysis presented in Section 2.B, which underpins the core concept of Meta-DBP. We will now discuss the reasons behind the results of these three simulations.

First, \(N_{ch}\) only influences the intensity of XPM noise, whereas the primary goal of FDBP is to capture the SPM noise within the channel. As a result, FDBP models trained on data with different \(N_{ch}\) values are essentially similar. Second, the form of FDBP is derived from the NLSE and Manakov equations, and its design inherently generalizes the \(P_{ch}\) parameter. Finally, different transmission rates fundamentally affect the temporal statistical properties of SPM noise. Higher rates introduce longer time-memory effects of SPM at the symbol scale, which is why models trained at different rates in simulation 3 are not compatible.

4.3 Meta-DSP Generalization Performance↩︎

Based on the analysis from the previous subsection on the FDBP comparison numerical simulations, we chose the task inputs for Meta-DBP as \(e=(R_s, N_f)\). The parameters \(N_{ch}\) and \(P_{ch}\) were not included in this input. This design allows Meta-DBP to naturally inherit the generalization performance of FDBP in terms of \(N_{ch}\) and \(P_{ch}\). Additionally, since we encoded the scaling relationship of N-filters under different transmission rates into the network structure itself, the final Meta-DSP can achieve excellent generalization performance across all modes. To train Meta-DSP effectively, we only need to collect a dataset of low-rate, single-channel, low-power data. Once the model is trained, it can be directly applied to any scenario including high-rate, multi-channel optical fiber signals.

In this subsection, we conduct numerical experiments to evaluate the generalization capability of Meta-DSP. Specifically, we train Meta-DSP models using a dataset configured with a symbol rate of 40 Gbaud and a single-channel setup. Subsequently, we test their Q-factor performance across a wide range of scenarios: symbol rates ranging from 40 Gbaud to 160 Gbaud and channel counts varying from 1 to 21. To assess the impact of model complexity, we implement two variants of Meta-DSP with different capacity levels (Step=5 and Step=25). For benchmarking purposes, we also train and tests FDBP under the same configuration. Furthermore, to examine the extent to which Meta-DSP achieves nonlinear compensation at 160 Gbaud, we additionally train an FDBP model with Step=25 using 160 Gbaud data as the upper bound for nonlinear compensation capability at 160 Gbaud.

As shown in Fig. 13, which presents the Q-factor performance at 160 Gbaud with 21 channels, the following results are observed. As baselines, CDC achieves the optimal Q factor of 7.53 dB at 4 dBm power, while DBP with steps=4 and steps=8 achieve optimal Q factors of 7.71 dB and 7.85 dB, respectively, at the same power level. Meta-DSP attains the optimal Q factor of 8.08 dB at 4 dBm power. In comparison, FDBP trained on 40 Gbaud data achieves an optimal Q factor of 7.24 dB at 3 dBm power, whereas FDBP trained on 160 Gbaud data reaches an optimal Q factor of 8.18 dB at 4 dBm power. It is evident that Meta-DSP, trained solely on 40 Gbaud data, exhibits remarkable Q-factor performance at 160 Gbaud with 21 channels, achieving a 0.55 dB gain over CDC and surpassing both DBP with steps=4 and steps=8. Furthermore, Meta-DSP’s performance is nearly on par with that of FDBP trained on 160 Gbaud data, with only a marginal 0.1 dB difference. This suggests that Meta-DSP has effectively learned the majority of the nonlinear compensation capability required for 160 Gbaud from the low-speed 40 Gbaud data, highlighting its robust generalization ability. In contrast, FDBP trained on 40 Gbaud data underperforms compared to CDC at 160 Gbaud with 21 channels and dual polarization, demonstrating its inability to generalize to high-speed scenarios.

Figure 13: Q-factor vs. launch power for 160 Gbaud, 21 channels, and dual polarization. Meta-DSP, trained on 40 Gbaud data, achieves a Q-factor of 8.08 dB at 4 dBm, closely matching FDBP trained on 160 Gbaud data (8.18 dB) and outperforming CDC (7.53 dB) and DBP (7.71 dB and 7.85 dB). This demonstrates Meta-DSP’s strong generalization and nonlinear compensation capabilities across symbol rates.

To provide a comprehensive overview of Meta-DSP’s generalization capabilities, we present the optimal Q-factors with respect to the launch power \(P_{ch}\) for various configurations of symbol rate \(R_s\) and channel count \(N_{ch}\) in Fig. 14. As shown, Meta-DSP consistently outperforms FDBP in scenarios involving higher symbol rates. Notably, even in the most challenging case—dual-polarization at 160 Gbaud with 21 channels—Meta-DBP achieves a performance gain of 0.55 dB. While FDBP exhibits limited generalization when transferring from low to high symbol rates. This superior performance of Meta-DSP can be attributed to its ability to learn not just an N-filter specific to the training rate (e.g., 40 Gbaud), but a generalized functional form of the N-filter that applies across different symbol rates. When combined with a scale transformation described in Eq. 7 , this design enables the nonlinear compensation capability learned at 40 Gbaud to be effectively generalized to 80 Gbaud and 160 Gbaud.

Figure 14: Optimal Q factor w.r.t power for compensation methods at different R_s, N_{ch}. We show Meta-DSP (step=25) and FDBP (step=25) trained on 40 Gbaud, 1 channel, dual polarization. Meta-DSP demonstrates strong knowledge transfer capabilities, significantly outperforming CDC across all settings. However, FDBP trained on a 40 Gbaud single channel does not possess the ability to transfer knowledge to high-speed environments. — Figure 14: Optimal Q factor w.r.t power for compensation methods at different \(R_s\), \(N_{ch}\). We show Meta-DSP (step=25) and FDBP (step=25) trained on 40 Gbaud, 1 channel, dual polarization. Meta-DSP demonstrates strong knowledge transfer capabilities, significantly outperforming CDC across all settings. However, FDBP trained on a 40 Gbaud single channel does not possess the ability to transfer knowledge to high-speed environments.

Table 3: Comparison of Methods for Q Gain over CDC and Time Consumption. The test scenario is 160G with 21 channels. Meta-DSP uses a Hypernetwork trained on 40 Gbaud to generate parameters, which results in a much faster processing speed. In contrast, FDBP requires training with SGD and the collection of a new dataset when faced with a new scenario, leading to significantly higher time consumption. For both step=5 and step=25, Meta-DSP achieves performance close to that of FDBP trained on the new dataset.
Method	CDC	Meta-DBP step=5	FDBP step=5	Meta-DBP step=25	FDBP step=25
Q Gain	7.53	7.76	7.79	8.08	8.18
Q Gain (over CDC)	0.00%	3.06%	3.46%	7.30%	8.63%
Time Consumption	-	< 0.1s (Hypernetwork)	>120s (SGD on new dataset)	< 0.1s (Hypernetwork)	>120s (SGD on new dataset)

In comparison to traditional DBP methods, we evaluated the computational cost of various approaches relative to their Q factor gains, as illustrated in Figure 15. The results demonstrate that Meta-DSP reduces the computational cost to one-tenth of that of DBP while achieving equivalent Q factor performance. When compared to FDBP, which serves as the target network for Meta-DSP, Meta-DSP does not exceed FDBP in terms of ultimate performance. However, it exhibits a distinct advantage: despite being trained exclusively on low-speed data, Meta-DSP achieves performance comparable to FDBP in high-speed scenarios. This highlights its exceptional generalization capability, a feature fundamentally absent in FDBP. In our implementation, the hypernetwork is structured as a multilayer perceptron (MLP) with only 3 hidden layers. By leveraging highly optimized matrix multiplication on modern hardware, the MLP operates with high computational efficiency and minimal storage requirements. For fiber signal scenarios, the hypernetwork takes scenario-related metadata as input and rapidly generates model parameters, bypassing the need for complex training processes. As shown in Table 3, the inference speed of Meta-DSP’s hypernetwork exceeds that of FDBP’s time-consuming SGD training process by a factor of over 1000, making it a highly efficient and scalable solution for practical applications.

Figure 15: Comparison of computational cost and Q factor gains for various methods. We show results of 160 Gbaud, 21 channels, dual polarization.

5 Conclusions↩︎

In this study, we introduced Meta-DSP, a novel multi-modal nonlinear compensation algorithm for optical fiber receivers inspired by meta-learning. Meta-DSP demonstrates robust capabilities in processing optical fiber data under a myriad of conditions, including long-distance transmissions, varying symbol rates, multiple channels, and a range of power settings. Building on the concept of hyper-networks, we devised Meta-DBP to learn how to generate N-filter for nonlinear compensation, and developed XPM-ADF to learn how to compensate XPM.

Our numerical simulations evaluated Meta-DSP’s performance on dual polarization 16-QAM signals. We tested transmission distances of 2000 km and explored symbol rates of 40 Gbaud, 80 Gbaud, and 160 Gbaud, along with WDM channel counts of 1, 3, 5 and 21. The transmission powers ranged from -8 dBm to 7 dBm. The results were encouraging: Meta-DSP trained at 40 Gbaud demonstrated strong generalization, successfully compensating for nonlinear effects even at higher rates and channel counts. The highest Q-factor improvements were 0.55 dB over CDC at 160 Gbaud with 21 channels.

Moreover, Meta-DSP achieved these gains while reducing computational complexity to one-tenth of that of DBP. These findings underscore the strong adaptability of Meta-DSP across various multi-modal scenarios and highlight the significant potential of data-driven paradigms in advancing nonlinear compensation algorithms for optical fibers.

Future research will focus on two key aspects. First, we will investigate the performance of Meta-DSP under more complex transmission models, such as the Manakov equation model with PMD. Second, we will explore the feasibility of Meta-DSP as a compensation algorithm under stronger optical power conditions. With the support of deep learning and data-driven modeling, our approach may pave the way to exploring the theoretical capacity limits of nonlinear channels.

Acknowledgement↩︎

Bin Dong is supported by the New Cornerstone Investigator Program and the National Natural Science Foundation of China (No. 12288101).

References↩︎

[1]

G. Liga, T. Xu, A. Alvarado, R. I. Killey, and P. Bayvel, “On the performance of multichannel digital backpropagation in high-capacity long-haul optical transmission,” Optics Express, vol. 22, no. 24, pp. 30 053–30 062, 2014.

[2]

P. Poggiolini, G. Bosco, A. Carena, V. Curri, V. Miot, and F. Forghieri, “Performance dependence on channel baud-rate of pm-qpsk systems over uncompensated links,” IEEE Photonics Technology Letters, vol. 23, no. 1, pp. 15–17, 2010.

[3]

Z. Tao, L. Dou, W. Yan, L. Li, T. Hoshida, and J. C. Rasmussen, “Multiplier-free intrachannel nonlinearity compensating algorithm operating at symbol rate,” Journal of Lightwave Technology, vol. 29, no. 17, pp. 2570–2576, 2011.

[4]

Q. Fan, G. Zhou, T. Gui, C. Lu, and A. P. T. Lau, “Advancing theoretical understanding and practical performance of signal processing for nonlinear optical communications through machine learning,” Nature Communications, vol. 11, no. 1, pp. 1–11, 2020.

[5]

Q. Fan, C. Lu, and A. P. T. Lau, “Combined neural network and adaptive dsp training for long-haul optical communications,” Journal of Lightwave Technology, vol. 39, no. 22, pp. 7083–7091, 2021.

[6]

C. Häger and H. D. Pfister, “Nonlinear interference mitigation via deep neural networks,” in Optical fiber communication conference.Optical Society of America, 2018, pp. W3A–4.

[7]

O. Sidelnikov, A. Redyuk, S. Sygletos, M. Fedoruk, and S. Turitsyn, “Advanced convolutional neural networks for nonlinearity mitigation in long-haul wdm transmission systems,” Journal of Lightwave Technology, vol. 39, no. 8, pp. 2397–2406, 2021.

[8]

O. Golani, R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Modeling the bit-error-rate performance of nonlinear fiber-optic systems,” Journal of Lightwave Technology, vol. 34, no. 15, pp. 3482–3489, 2016.

[9]

R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlinear noise in long, dispersion-uncompensated fiber links,” Optics Express, vol. 21, no. 22, pp. 25 685–25 699, 2013.

[10]

M. Secondini and E. Forestieri, “Analytical fiber-optic channel model in the presence of cross-phase modulation,” IEEE Photonics Technology Letters, vol. 24, no. 22, pp. 2016–2019, 2012.

[11]

R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Inter-channel nonlinear interference noise in wdm systems: Modeling and mitigation,” Journal of Lightwave Technology, vol. 33, no. 5, pp. 1044–1053, 2014.

[12]

E. Nassaji, D. Truhachev, H. Najafi, H. Ebrahimzad, J. Mitra, and R. Janicek, “Xpm estimation and compensation in the wdm system based on a low-complexity inner fec,” Optics Letters, vol. 48, no. 21, pp. 5663–5666, 2023.

[13]

W. Lian, Q. Zheng, W. Li, and Y. Wang, “Xpm mitigation in wdm systems enabled by split nlc and modified dd-rls based nlpn tracking,” Optics Communications, vol. 474, p. 126184, 2020.

[14]

R. Dar and P. J. Winzer, “Nonlinear interference mitigation: Methods and potential gain,” Journal of Lightwave Technology, vol. 35, no. 4, pp. 903–930, 2017.

[15]

Q. Zheng, W. Li, R. Yan, Q. Feng, Y. Xie, and Y. Wang, “Xpm mitigation in wdm systems using split nonlinearity compensation,” IEEE Photonics Journal, vol. 11, no. 6, pp. 1–11, 2019.

[16]

D. Ha, A. Dai, and Q. V. Le, “Hypernetworks,” arXiv preprint arXiv:1609.09106, 2016.

[17]

J. Vanschoren, “Meta-learning: A survey,” arXiv preprint arXiv:1810.03548, 2018.

[18]

C. Häger and H. D. Pfister, “Physics-based deep learning for fiber-optic communication systems,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 280–294, 2020.

[19]

X. Lin, S. Luo, S. K. O. Soman, O. A. Dobre, L. Lampe, D. Chang, and C. Li, “Perturbation theory-aided learned digital back-propagation scheme for optical fiber nonlinearity compensation,” Journal of Lightwave Technology, vol. 40, no. 7, pp. 1981–1988, 2021.

[20]

L. B. Du and A. J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems,” Optics express, vol. 18, no. 16, pp. 17 075–17 088, 2010.

[21]

M. Secondini, D. Marsella, and E. Forestieri, “Enhanced split-step fourier method for digital backpropagation,” in 2014 The European Conference on Optical Communication (ECOC).IEEE, 2014, pp. 1–3.

[22]

[23]

S. Luo, S. K. O. Soman, L. Lampe, and J. Mitra, “Deep learning-aided perturbation model-based fiber nonlinearity compensation,” Journal of Lightwave Technology, vol. 41, no. 12, p. 3976–3985, Jun. 2023. [Online]. Available: http://dx.doi.org/10.1109/JLT.2023.3279449.

[24]

R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Time varying isi model for nonlinear interference noise,” in Optical Fiber Communication Conference.Optica Publishing Group, 2014, pp. W2A–62.

[25]

Y. Mori, C. Zhang, and K. Kikuchi, “Novel configuration of finite-impulse-response filters tolerant to carrier-phase fluctuations in digital coherent optical receivers for higher-order quadrature amplitude modulation signals,” Optics Express, vol. 20, no. 24, pp. 26 236–26 251, 2012.

[26]

A. Napoli, Z. Maalej, V. A. Sleiffer, M. Kuschnerov, D. Rafique, E. Timmers, B. Spinnler, T. Rahman, L. D. Coelho, and N. Hanik, “Reduced complexity digital back-propagation methods for optical communication systems,” Journal of lightwave technology, vol. 32, no. 7, pp. 1351–1362, 2014.

[27]

T. Pfau, S. Hoffmann, and R. Noé, “Hardware-efficient coherent digital receiver concept with feedforward carrier recovery for \(m\)-qam constellations,” Journal of Lightwave Technology, vol. 27, no. 8, pp. 989–999, 2009.

[28]

I. Fatadin, D. Ives, and S. J. Savory, “Laser linewidth tolerance for 16-qam coherent optical systems using qpsk partitioning,” IEEE Photonics Technology Letters, vol. 22, no. 9, pp. 631–633, 2010.

Xinyu Xiao, School of Mathematical Science, Peking University. xiao-xin-yu@pku.edu.cn↩︎
Zhennan Zhou, Corresponding author. Institute for Theoretical Sciences, Westlake University, Hangzhou, 310030, China (zhouzhennan@westlake.edu.cn).↩︎
Bin Dong, Corresponding author. Beijing International Center for Mathematical Research, Peking University; Center for Machine Learning Research, Peking University; National Biomedical Imaging Center, Peking University. dongbin@math.pku.edu.cn↩︎
Dingjiong Ma, Theory Lab, Central Research Institute, 2012 Labs, Huawei Technology Co. Ltd., Hong Kong, China. ma.dingjiong2@huawei.com↩︎
Li Zhou, Theory Lab, Central Research Institute, 2012 Labs, Huawei Technology Co. Ltd., Shanghai,P.R. China. zhouli107@huawei.com↩︎
Jie Sun, Theory Lab, Central Research Institute, 2012 Labs, Huawei Technology Co. Ltd., Hong Kong, China. j.sun@huawei.com↩︎

Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems