October 15, 2025
[AI]Artificial Intelligence [SMF]Single-Mode Fibre [PCI]Percutaneous Coronary Intervention [ASD]Average Surface Distance [CNN]Convolution Neural Network [CPU]Central Processing Unit [ECDF]Empirical Cumulative Distribution Function [GPU]Graphics Processing Unit [HD]Hausdorff Distance [DSC]Dice Similarity Coefficient [NDFF]National Dark Fibre Facility [OCiC]Optical Computation-in-Communication [ORCU]Optical Remote Computing Unit [MNIST]Modified National Institute of Standards and Technology [2D]two-dimensional [TOPS]tera-operations per second [EO]Electro-Optic [MZM]Mach-Zehnder modulator [OSSM]Optical Spectral Shaping Module [DSP]Digital Signal Processing [EDFA]Erbium-Doped Fibre Amplifier [ROC]receiver operating characteristic [AUROC]area under the ROC curve [AWGN]additive white Gaussian noise [CW]continuous-wave
[LMICs]Low- and middle-income countries
0.9
Telesurgery offers a compelling strategy to reduce global disparities in surgical care, addressing an unmet need affecting billions of people lacking timely and life-saving interventions [1], [2]. This challenge is most acute in underserved and remote regions, where establishing and sustaining a qualified surgical workforce remains prohibitively slow and costly [3]. Low- and middle-income countries (LMICs) exemplify this crisis, experiencing severe shortages of surgical specialists, with densities as low as 0.7 per 100,000 people, compared to 56.9 per 100,000 in high-income regions [4]. As a result, up to 94% of individuals in these areas lack access to safe surgical care [5]. Remote surgery thus represents a scalable solution to this global healthcare gap [6], as initially demonstrated by pioneering achievements in telesurgery [7], [8]. However, wider adoption is limited by the lack of immediate, on-site sensory feedback [6], which complicates intraoperative decision-making and can compromise safety in high-precision procedures.
AI is emerging as a transformative tool, demonstrating significant potential to improve intraoperative perception, decision-making, surgical precision, and operative workflows [9]. These capabilities span diverse applications, including reconstructing 3D tissue structures from laparoscopic views [10], real-time tracking and localisation of surgical instruments [11], and intraoperative planning assistance for vascular procedures [12]. Despite these advances, current AI-assisted robotic surgical systems predominantly rely on local electronic inference, introducing computational latency that frequently reaches hundreds of milliseconds [13]. When combined with inherent communication delays in telesurgery, total latency often surpasses the clinically acceptable 100 ms round-trip threshold [6], compromising the real-time control accuracy essential for intraoperative safety. This challenge is especially acute in latency-sensitive, high-precision procedures such as percutaneous coronary interventions [14], retinal microsurgery [15], and neurosurgery [16]. Consequently, compounded latency significantly impairs real-time surgical judgment and control [17], jeopardising patient safety and limiting the widespread adoption of robotic surgery in remote or resource-limited settings.
Here, we introduce an OCiC framework that addresses latency constraints by fundamentally fusing deep-learning inference within the optical communication pathway. The system comprises a cascade of geographically distributed ORCUs, each hierarchically implementing a distinct convolutional layer through 2D photonic computation, as illustrated in Fig. 1b. This architecture intrinsically merges inference latency with optical transmission, significantly minimising total end-to-end delay, as shown in Fig. 1a. Beyond latency reduction, each ORCU achieves high spectral efficiency and computational fidelity through ultra-stable kernels that maintain GPU-level accuracy under moderate optical noise. This design inherently mitigates cumulative error propagation [18], a longstanding challenge for scaling deep optical networks. Crucially, this electronic-level fidelity is achieved exclusively through architectural design, eliminating the need for auxiliary hardware [19] or additional model retraining [20]–[22]. Unlike conventional photonic processors constrained to centralized, on-site deployment [23]–[29], the OCiC framework is inherently compatible with existing optical network infrastructure, providing a scalable, in-network photonic AI solution optimally tailored to latency-sensitive applications.
We demonstrate experimentally that a single ORCU can process up to 192 programmable kernels in parallel on one channel, which serves as the computational foundation for deep in-network optical inference. This configuration achieves a peak throughput of 69 TOPS per channel, while maintaining inference fidelity within 0.1% of the CPU baseline on the MNIST benchmark. To evaluate scalability and clinical applicability, we implemented a simplified OCiC setup comprising cascaded ORCUs for coronary angiography segmentation, replacing several convolutional layers of the network. The system maintained inference fidelity within 0.1% of GPU-based results under both controlled laboratory conditions and across a 38.9 km fibre link of the UK’s National Dark Fibre Facility (NDFF), despite variable real-world environmental conditions. These results demonstrate the OCiC framework’s potential for robust, real-time medical AI inference, paving the way for globally scalable telesurgery that meets clinical standards of precision and responsiveness.
To achieve practical scalability of the OCiC framework, we developed the ORCU architecture to address three critical deployment challenges: inference fidelity, spectral efficiency, and environmental robustness. As illustrated in Fig.2, ORCU integrates optical communication and computation within a single fibre via wavelength-division multiplexing. Each computational channel executes massively parallel 2D photonic convolutions directly during optical transmission. Experimental validation confirms ORCU achieves CPU/GPU-level inference fidelity, delivering a peak throughput of 69 TOPS per channel and a spectral efficiency of 197.1 TOPS/THz. Moreover, robust performance demonstrated in outdoor fibre deployments highlights ORCU’s suitability as a scalable, geographically distributed optical platform. These findings position ORCU as a promising solution for real-time, in-network photonic AI, particularly optimised for latency-sensitive applications.
Optical processors typically perform well on shallow benchmarks such as MNIST, yet commonly exhibit accuracy gaps of approximately 1–2% relative to their electronic counterparts [24]–[26], [28], [29]. This discrepancy intensifies in deeper networks primarily due to cumulative kernel noise, progressively degrading inference quality and potentially causing model collapse [18]. Unlike transient feature noise suppressed by residual connections and nonlinear activations, kernel noise persists and accumulates throughout the network. To address this, ORCU employs an ultra-stable 2D optical comb source specifically engineered to preserve kernel fidelity. Experimental validation confirms comb-line power variations consistently remain below 0.1 dB under laboratory conditions and below 0.2 dB during outdoor deployment (Supplementary Video). This stability surpasses that of widely used microring-based comb sources [23], [24], enabling ORCU to maintain CPU/GPU-level inference fidelity even under moderate optical noise conditions.
Another key advantage of ORCU is its exceptionally high computational spectral efficiency, critical for remote computing. Conventional fibre-based photonic accelerators rely on 1D convolutions to approximate inherently 2D operations. In contrast, ORCU directly implements parallel 2D convolutions (Fig.2c). Using a coupler array and receiver-side post-modulation (Fig.2d), a single 350 GHz channel can execute up to 192 programmable 3×3 kernels simultaneously, with a peak throughput of 69 TOPS. As one of the pioneering demonstrations of fibre-type photonic convolution accelerators [24], the benchmark system requires 3.6 THz of optical bandwidth to reach 11.3 TOPS in 1D convolutions, equivalent to 3.8 TOPS for 3×3 kernels. By contrast, ORCU attains a spectral efficiency of 197.1 TOPS/THz, over 186 times higher than the 1.06 TOPS/THz reported in [24]. Compared with the representative optical cloud-computing approach described in [30], ORCU markedly improves spectral efficiency while avoiding the latency, overhead and synchronisation penalties of intermediate signal demodulation.
In addition to inference, ORCU supports a computation-as-communication paradigm, replacing dedicated optical interconnections by directly embedding computation into the optical data path. This approach significantly reduces intermediate data transmission between nodes during collaborative inference, thereby improving spectral efficiency in geographically distributed learning systems. The resulting communication spectral efficiencies reach 87.8 bit/s/Hz for 4-bit resolution inputs and 153.6 bit/s/Hz for 7-bit, surpassing existing optical transmission records of 10.7 bit/s/Hz for single-mode fibre (SMF) [31] and 51 bit/s/Hz for multi-core fibre [32]. These advancements position OCiC as particularly valuable for privacy-sensitive, regulation-compliant multilayer split learning, especially in healthcare, finance, and the public sector.
To ensure robust performance under practical conditions, ORCU integrates mechanisms specifically designed to mitigate environmental variability inherent in outdoor fibre deployments. A receiver-side post-modulation kernel assignment scheme enables real-time monitoring and dynamic compensation of power distortions induced by system nonlinearities and external perturbations. Furthermore, remote optical computing schemes may be affected by synchronisation issues arising from fibre-length fluctuations [33], [34]. In contrast, ORCU avoids these problems by performing 2D convolution entirely within the fibre through wavelength-to-time mapping. Collectively, these innovations substantially enhance system stability and reliability, underscoring ORCU’s readiness for real-world deployment.
In summary, the ORCU architecture achieves high-throughput optical inference at CPU/GPU-level fidelity, exceptional spectral efficiency, and robust performance across diverse environmental conditions within standard fibre infrastructure. Taken together, its inherent remote-computing capability positions OCiC as a globally scalable platform for real-time, low-latency AI, particularly suitable for telesurgical applications.
We benchmarked ORCU’s inference fidelity against a CPU baseline using the MNIST classification task. Four pre-trained 3×3 convolution kernels from a CNN were mapped onto optical filters, each randomly assigned to one of ORCU’s 192 programmable parallel-processing kernels.
The confusion matrices (Fig.3a) and feature maps (Fig.3c) produced by ORCU closely matched CPU results, demonstrating no discernible qualitative differences. ORCU achieved a classification accuracy of 96.2% on an identical 1,800-image test set, closely matching the CPU benchmark (96.1%). This minimal inference discrepancy is further supported by strong agreement between experimental and theoretical waveforms and their prediction outcomes (see Supplementary Information B.1-3). Additional fidelity indicators include a regression correlation of 0.96 (Fig.3d), precision–recall and receiver operating characteristic (ROC) curves (Fig.3e), and metrics such as F1 score, area under the ROC curve (AUROC), and average precision (Fig.3f).
Collectively, these results confirm ORCU achieves CPU-level inference accuracy while enabling high-throughput parallel processing, surpassing existing benchmark photonic processors (Fig.3b). This high fidelity allows ORCU to effectively mitigate cumulative errors even under moderate optical noise, providing a robust foundation for scalable, deep-layered OCiC deployments across geographically diverse regions.
A major challenge in telesurgery is the lack of immediate, high-fidelity sensory feedback due to network latency and compression, which complicates procedures requiring precise, real-time decisions. This limitation remains a critical barrier to the broader adoption of telesurgery, particularly in anatomically complex scenarios where even minor errors or delays can have life-threatening consequences. Nonetheless, remote interventions for acute vascular emergencies such as myocardial infarctions and strokes remain urgently needed due to their narrow therapeutic windows [35], [36] and a global shortage of interventional cardiologists and cardiac specialists [5], [37], [38]. By leveraging ORCU’s CPU/GPU-level inference fidelity and its intrinsic remote-computing capability, the OCiC framework delivers real-time, AI-enhanced perception without introducing additional computational latency. This approach will simplify remote surgical interventions and significantly expand timely access to specialised care, especially in underserved regions.
To evaluate OCiC’s clinical inference capabilities under real-world deployment conditions, we implemented key convolutional layers from a pre-trained 161-layer U-DenseNet for coronary artery segmentation in contrast-enhanced X-ray angiograms. During vascular interventions, real-time fluoroscopic imaging provides essential intraoperative guidance for precise localisation and treatment of vascular pathologies. By delivering real-time, AI-enhanced, fluoroscopy-synchronised visualisation, OCiC strengthens visual feedback in the absence of immediate haptic cues, thereby supporting more precise and timely decision-making. While demonstrated here for vascular surgery, this AI-assisted approach has broad applicability and can be seamlessly extended to other surgical domains, such as endoscopic interventions, further enhancing the quality and reliability of remote procedures. To assess system robustness, the ORCU was specifically assigned to compute the first and last convolutional layers. These layers are particularly sensitive to distinct noise types: kernel noise in the initial layer propagates through subsequent layers, while feature noise in the final layer directly impacts segmentation accuracy. Performance was evaluated on 20 unseen angiographic images across three scenarios: (1) GPU-only baseline, (2) OCiC in a controlled laboratory environment, and (3) OCiC deployed over a 38.9 km outdoor dark fibre link (NDFF) to assess robustness under real-world conditions. Comprehensive details of training methods and deployment configurations are provided in Methods.
As illustrated in Fig. 4a, the OCiC framework demonstrated robust and consistent segmentation performance under both controlled laboratory and outdoor NDFF conditions. The system achieved nearly identical average Dice scores of 0.800 (laboratory) and 0.801 (NDFF), closely matching the GPU baseline of 0.801. Using GPU-derived predictions as a reference standard, F1 score analysis yielded values of 0.984 (laboratory) and 0.983 (NDFF), indicating minimal deviation from GPU-based inference. Violin plots of Dice scores, precision, and recall (Fig. 4b) confirmed that distributions were statistically indistinguishable from GPU-based results, highlighting the system’s stability. Empirical Cumulative Distribution Function (ECDF) (Fig. 4c) and scatter plots (Fig. 4d) further confirmed minimal output discrepancies, while strong agreement in Average Surface Distance (ASD) and Hausdorff Distance (HD) metrics underscored the preservation of spatial segmentation accuracy. Comprehensive results for all 20 test cases, including vascular segmentation maps, per-image Dice scores, and pixel-level comparisons, are provided in the Supplementary Information B.4-6. Collectively, these results show that OCiC achieves real-time inference with performance comparable to a GPU baseline. Its reliable accuracy under both laboratory and realistic field conditions further demonstrates resilience to environmental perturbations, supporting its suitability for latency-sensitive applications such as telesurgery.
The OCiC framework leverages the ORCU’s capabilities to reduce error accumulation and perform AI inference concurrently with optical data transmission. This approach enables the direct deployment of pre-trained models, specifically tailored to surgical tasks, over existing fibre-optic infrastructure. The architecture provides real-time, high-fidelity perception for surgeons during complex telesurgical procedures, substantially reducing latency-related risks and potentially enhancing patient safety.
First, experimental results demonstrate strong quantitative agreement with electronic inference, underscoring ORCU’s robustness against error accumulation, which is a crucial prerequisite for deploying deep learning networks on optical hardware [18]. To rigorously quantify this robustness, we conducted noise-injection simulations targeting two dominant sources: feature noise, arising from transient optical and electrical fluctuations, and kernel noise, induced by system instabilities (see Extended Data Fig. 9). Using a conservative accuracy margin of 0.1% derived from the MNIST benchmark, we estimated the potential accuracy degradation resulting from full-network deployment of the surgical segmentation model within OCiC. Under these conditions, the projected full-network accuracy ranged from 0.80 (feature noise) to 0.79 (kernel noise), preserving 98.7–99.8% of the baseline accuracy (0.801). These results strongly support the feasibility of directly deploying pre-trained deep neural networks within OCiC, delivering real-time, AI-enhanced perception to assist intraoperative decision-making during telesurgical procedures. Crucially, our simulations indicate that kernel instability poses a significantly greater threat to inference accuracy than feature noise. Even modest degradation observed in relatively simple tasks (e.g., 1.1% on MNIST) can escalate beyond 10% in deeper, more complex networks, severely constraining the scalability of photonic AI systems. This discrepancy arises from a fundamental distinction: kernel noise disrupts learned weights, causing structural perturbations that propagate across layers and progressively degrade feature extraction. In contrast, feature noise is transient and typically attenuates as it propagates, provided it remains moderate and unbiased. Collectively, these findings underscore the critical need for improving stability and precision in photonic components, particularly microring resonators and phase-change materials, thereby charting a clear roadmap towards robust deep-layer inference in future optical AI systems.
Second, the OCiC framework strategically overlaps computational processing with optical transmission, substantially minimising end-to-end latency and extending the geographical reach of real-time, AI-assisted telesurgical interventions. Given a typical propagation speed of approximately \(2\times10^8\) m/s in optical fibre, the clinically acceptable round-trip latency threshold of 100 ms [6] translates into a theoretical operational range of approximately 10,000 km. This operational range comfortably covers key global routes such as London to Tokyo (9,600 km), Los Angeles to Buenos Aires (9,800 km), and Paris to Cape Town (9,300 km), underscoring the OCiC system’s architectural suitability for latency-critical, high-precision interventions across continents. Crucially, the framework consistently maintains latency within clinically acceptable limits, facilitating seamless, real-time collaboration among clinical centres worldwide and integrating distributed global expertise [39]. Experimentally, a single computational channel demonstrated a peak throughput of 69 TOPS. Leveraging the full 11.9 THz optical bandwidth across the C and L bands, each ORCU can concurrently process up to 34 parallel computational channels per SMF, significantly enhancing batch-processing capacity. In principle, a single 10-km bidirectional SMF segment can deliver a peak throughput of up to 4.7 peta-operations per second. Extended to a 10,000-km OCiC network, the theoretical peak throughput scales markedly, potentially reaching 4.9 exa-operations per second using only one fibre per ORCU segment. In practical deployments, standard optical cables containing hundreds of fibres enable orders-of-magnitude greater distributed photonic computing capacity. Such infrastructure effectively transforms conventional long-haul optical links into continental-scale computing fabrics, offering shared, decentralised AI resources accessible to urban, suburban, and rural communities alike. These capabilities lay the foundation for global-scale, AI-assisted telesurgical interventions and facilitate real-time, cross-border experimental collaboration among geographically dispersed clinical, research, and academic institutions. Potential applications include multi-centre clinical trials, remote participation in international physics experiments, and seamless cross-border integration of scientific data and knowledge.
Deploying remote operating rooms along the OCiC network in LMICs, rural communities, and emergency-affected regions can deliver timely, life-saving surgical interventions to underserved populations. This model directly addresses urgent healthcare needs in resource-limited settings without requiring long-term investments in local specialist training, while facilitating effective redistribution of surplus surgical capacity from well-resourced centres to regions with limited access. Ultimately, it presents a cost-effective, scalable, and equitable strategy to reduce global health disparities and accelerate progress towards achieving the Global Surgery Goals [40].
0.05
CNNs are a class of deep learning architectures widely applied to visual data processing. At their core, convolutional layers employ small learnable kernels to extract local features such as edges and textures from input images. As each kernel convolves across the image, the resulting feature map encodes distinct spatial patterns through its aggregated responses. Multiple kernels operate in parallel, enabling the network to capture diverse visual cues that underpin its success in image recognition and related tasks.
The computational demands of large-scale CNNs are driving the development of photonic tensor accelerators that exploit the high bandwidth of optical systems for parallel convolution operations. In fibre-based designs, two-dimensional images are typically flattened into one-dimensional data streams. These are then transmitted into optical systems via a MZM. However, a stride limitation inherent to most fibre-type photonic accelerators substantially reduces the effective computing speed, achieving only one-third of the nominal rate for 3×3 convolutions , as illustrated in Extended Data Fig. 5. To overcome this, ORCU employs comb sets with an engineered frequency-domain distribution as the light source for each computation channel, enabling direct 2D convolution. For a 3×3 kernel, the system generates an unfolded comb set of three sparsely spaced groups, each corresponding to one row of the kernel. Within each group, three densely spaced comb lines represent the elements of that row, separated by \(\lambda_{\text{pixel}}\). This wavelength spacing corresponds to the one-pixel time delay introduced via wavelength–to–time mapping in the SMF. The comb groups are separated by a larger wavelength interval, \(\lambda_{\text{row}}\), providing temporal separation between adjacent pixel rows. Fig.2a shows multiple computation and communication channels sharing the same SMF, with a single ultra-broad comb (typically generated by a mode-locked laser or microring resonator) divided among all channels. Experimental measurements indicate that comb stability, particularly in power fluctuation and wavelength spacing, exerts a stronger influence on performance than optical noise. Nevertheless, generating ultra-stable, high-repetition-rate combs with microring resonators or mode-locked lasers remains technically challenging. In this work, we adopt electro-optic comb generation with modulated coupled laser sources to ensure the stability required for high-performance photonic computing.
The preprocessing stage includes flattening 2D images into one-dimensional row-wise sequences (Fig.2b) and applying predistortion (see Supplementary Information C.2 for details). After preprocessing, the optical signal in each computational channel is modulated by the flattened image at a symbol rate \(R_s\) (symbols per second) and transmitted through a SMF of approximately 10 km. As shown in Fig.2c, the SMF performs wavelength–to–time mapping, interleaving the different wavelengths of each computational channel with the required time delays. In this configuration, adjacent comb lines correspond to neighbouring pixels, while the spacing between different comb groups represents the time delay between image rows. At the SMF output, the wavelength–time interleaved data streams yield unweighted, wavelength-domain kernels. These kernels directly perform 2D convolution, processing data on a pixel-by-pixel basis over successive symbol durations (lower panel of Fig.2c). This direct 2D convolution also mitigates the stripe problem common in other fibre-based photonic accelerators, a previously described limitation that significantly reduces effective computing speed (Extended Data Fig. 5).
Additionally, the proposed ORCU uses a post-modulation strategy for kernel assignment (Fig.2d) to enhance its capability for parallel convolutional computation. At the receiver end of the SMF, the unweighted 2D kernels are replicated into multiple copies using a coupler array, with each copy individually shaped by an OSSM. This configuration improves spectral efficiency and increases the practical value of the technique for deep network deployment. Experimental results show that the current ORCU design achieves high computational efficiency while maintaining CPU-level inference accuracy. It supports the parallel computation of up to 192 kernels (\(2^6\) × 3 = 192 kernel paths) on a single computational channel using only 350.7 GHz of bandwidth. This corresponds to a total computing power of 69.12 TOPS (192 × 2 × 9 × 20 GHz) and a computational spectral efficiency of 197.1 TOPS/THz. The simultaneous arrival of convolutional results from multiple kernels at the receiver reduces the required refresh cycles to process the same image, thereby lowering storage pressure across the entire OCiC link and minimising latency. Furthermore, the post-modulation approach enables real-time supervision and adjustment of kernel information, allowing the system to dynamically adapt to environmental fluctuations and transmitter-end variations.
This study investigates whether the proposed ORCU can mitigate structural perturbations in convolution kernels. Achieving this capability is essential for deploying optical computing units in deep learning networks compliant with AI scaling laws. Simulations indicate that deep networks are substantially more sensitive to structural perturbations caused by kernel noise than to feature noise affecting the feature map. Stable comb sources and precise OSSMs are therefore critical to preserving kernel fidelity and system robustness. To ensure this stability, EO comb generation using modulated coupled laser sources is employed. Experimental validation shows that EO combs maintain power fluctuations below 0.1 dB in temperature-controlled environments and below 0.2 dB under room-temperature conditions with outdoor dark-fibre links, as shown in the Supplementary Video. In our proof-of-concept system, waveshapers are used as OSSMs to provide accurate and stable kernel modulation, thereby validating high-fidelity optical inference of ORCU and enabling exploration of its practical performance limits. More integrated alternatives, such as microring arrays or arrayed waveguide gratings with variable optical attenuators, offer higher integration potential. However, their instability, arising from thermal drift and control complexity, undermines kernel fidelity and limits their practicality in deep learning deployment. Taken together, these results underscore that continued development of stable, integrated, and cost-effective comb sources and OSSMs is crucial for scalable optical computing systems to meet the escalating demands imposed by AI scaling laws.
We conducted three experiments, each with a tailored setup for specific objectives. The first experiment focused on performance verification. A large-scale fan-out coupler array was constructed using six layers of 1 × 2 couplers and one layer of 1 × 3 couplers. This array uniformly distributed the received unweighted convolutional signals into 192 optical sub-paths with comparable optical quality (power, signal-to-noise and distortion ratio, and distortion characteristics), thereby enabling parallel computation of 192 distinct convolution kernels under consistent conditions. To assess performance, four of the 192 sub-paths were weighted with convolution kernels from a pre-trained CNN for the MNIST classification task. The optical system achieved classification accuracy (96.2%) nearly identical to CPU-based computation (96.1%). These results confirm that ORCU sustains CPU-level inference fidelity under moderate optical noise, provided kernel stability and precision are maintained.
In this MNIST classification experiment, a 9.5 km SMF served as the core of ORCU, providing stable, linear wavelength-to-time mapping . The flattened image sequence was transmitted at 20 GBaud. The EO comb generator was realised by combining three CW lasers (1560.757 nm, 1552.122 nm, and 1543.212 nm, with output powers of 15.5 dBm, 14.5 dBm, and 16.5 dBm) through a 3 × 1 coupler. The unequal power levels were intentionally set to compensate for the nonlinear response of the EDFAs. The combined optical signal then passed through an EO comb generator consisting of two cascaded phase modulators driven by a 38.963 GHz RF sine wave, producing three groups of three-line combs with the required one-pixel wavelength spacing. The resulting nine comb lines were centred at 1542.903 nm, 1543.212 nm, 1543.522 nm, 1551.809 nm, 1552.122 nm, 1552.436 nm, 1560.441 nm, 1560.757 nm, and 1561.074 nm, mapping directly to the nine elements of a 3 × 3 convolution kernel.
To achieve accurate wavelength–to–time interleaving, the one-pixel wavelength spacing, (\(\lambda_\text{pixel}\)), is defined as \[\begin{align} \lambda_{\text{pixel}} = \frac{\tau_{\text{pixel}}}{D \cdot L} = \frac{1}{D \cdot L \cdot R_S} \end{align}\] where L is the SMF length, D is the fibre dispersion, \(R_{S}\) is the symbol rate of the input data, and the pixel duration is given by \(\tau_{\text{pixel}}=\frac{1}{R_{S}}\).
After propagation through the 9.5 km SMF, the unweighted convolution signal is fed into the fan-out coupler array. Two EDFAs compensate for insertion and transmission losses in the transmitter, fibre, and fan-out stages. At the end of each optical sub-path, a waveshaper acting as the OSSM modulates the power of each comb line to encode the corresponding convolution-kernel weights. Because optical power cannot directly represent negative values, positive and negative kernel components are modulated separately and delivered to the two output ports of the waveshaper. The final convolution result is obtained by subtracting the two outputs, which can be realised with a balanced photodiode, a differential amplifier, or, as implemented in this experiment, DSP. The complete experimental setup, including device specifications, is illustrated in Extended Data Fig. 6.
The second experiment evaluates the capability of the proposed OCiC to perform coronary artery segmentation in support of real-time AI-assisted telestenting surgery. To assess whether the OCiC system can execute this task using a pre-trained model, a 161-layer U-DenseNet previously trained on a GPU was employed. In this setup, ORCU was used to execute the first and last convolutional layers. This choice reflects two considerations. The first layer is most susceptible to error accumulation due to kernel noise, which can propagate through the entire network. The last layer is highly sensitive to feature noise, directly impacting output accuracy. Given the structure of the U-DenseNet, the coupler array was scaled down to five layers, yielding 32 optical sub-paths, each corresponding to one of the 32 convolution kernels within each channel. Additionally, an extra communication channel was included in the experiment to assess the feasibility of residual data transmission through the OCiC system.
To reflect the geographical context of London, the SMF length for convolutional kernel construction and data transmission was extended to 13.5 km in this experiment. This configuration enabled both convolutional layers to be processed over a total distance of 27 km, sufficient to cover central London to its outskirts in a telesurgical scenario. To accommodate the longer fibre length, the data rate was reduced to 16 GBaud, and the comb generation frequency was adjusted to 34.72 GHz to provide the required one-pixel wavelength spacing. The testing dataset comprised 20 coronary artery X-ray images, each 128 × 128 pixels in size. Owing to the limited spectral bandwidth of the C+L band, each image was partitioned into six 24 × 128 sub-images, since the spectral spacing for 256 pixels (two rows) could not be supported simultaneously. Details of the image partitioning process are provided in Supplementary Information C.2. In this experiment, the 3 × 3 comb group was tuned to the following wavelengths: 1545.149 nm, 1545.426 nm, 1545.702 nm, 1551.844 nm, 1552.122 nm, 1552.402 nm, 1558.380 nm, 1558.661 nm, and 1558.943 nm, corresponding to the revised symbol rate, fibre length, and pixel count per row. After each convolutional operation, the sub-images were recombined to reconstruct the complete image. The full experimental setup is shown in Extended Data Fig. 7.
Finally, the same coronary artery segmentation task was carried out over a 38.9 km segment of dark fibre within the UK’s NDFF to assess the robustness of the proposed ORCU under outdoor conditions. This experiment validated the OCiC deployment under real-world conditions, encompassing temperature variations, weather changes, polarisation fluctuations, and physical disturbances from nearby traffic (temperature variations, weather changes are summarised Supplementary Information C.3). Owing to limited physical space at the NDFF laboratory and an unexpectedly high fibre-optic power loss of 14 dB, a simplified configuration using a 1-to-4 coupler array was adopted, which was sufficient to perform the segmentation task. To accommodate the longer fibre, the symbol rate was reduced to 8 GBaud, and the 3 × 3 comb group was tuned to the following wavelengths: 1547.166 nm, 1547.364 nm, 1547.562 nm, 1551.923 nm, 1552.122 nm, 1552.322 nm, 1556.600 nm, 1556.801 nm, and 1557.001 nm. The full experimental setup is illustrated in Extended Data Fig. 8.
Additionally, PAM-16 (16-level Pulse Amplitude Modulation) was selected as the modulation format for the MNIST classification task and the first layer of the coronary artery segmentation task, primarily to reflect practical real-world communication configurations. Widely adopted in experimental and research contexts, PAM-16 offers a balanced trade-off between modulation complexity and implementation feasibility. This choice avoids overly idealised assumptions, thereby enhancing the real-world relevance of both the system model and experimental results. In contrast, the final layer used unconstrained input levels. Both input data and kernel weights retained five significant digits of precision, rather than being quantised to 4-bit representation. This approach helps prevent artificial performance degradation and allows for meaningful comparison with GPU-based results. Moreover, since the input to the final layer includes negative values, it was split into two images: one containing only positive values, and the other the absolute values of negative components. These were processed sequentially using the same convolutional kernels, and the outputs were recombined via a DSP programme.
We experimentally validated the OCiC framework on two representative computer vision tasks: image classification and semantic segmentation. Image classification is based on MNIST hand-written digit dataset , while semantic segmentation task is based on dataset from Cervantes et al. . Both pipelines followed a unified workflow of model construction, training, and inference, with selected inference stages executed on the OCiC platform.
For the classification task, we constructed a CNN with one convolutional layer (four kernels), a 16-neuron hidden layer, and a 10-neuron output layer corresponding to the MNIST digit classes. The model was trained on 20,000 MNIST images and tested on 1,800 held-out examples. During inference, the convolutional layer was replaced by ORCU, where optically computed features were fed directly into the fully connected layers. The output was a 10-dimensional probability vector, with the highest-probability class selected as the prediction.
For semantic segmentation, representative of AI-assisted telesurgical scenarios, we developed a deep neural network based on a hybrid DenseNet and U-Net architecture, referred to as U-DenseNet . Dense and transition blocks were arranged in a U-shaped topology to enable high-resolution segmentation of coronary vasculature. The model was trained on 107 contrast-enhanced X-ray angiograms and tested on 20 cases, with no test data used during training. During evaluation, the first and final convolutional layers were executed on the OCiC system. Inference was validated under both controlled laboratory conditions and deployment across a 38.9 km segment of the UK’s NDFF, with GPU-based inference as the baseline.
Both networks were trained on a high-performance computing server at University College London, equipped with 4 × Intel Xeon Gold 6126 CPUs (2.60 GHz), 4 × NVIDIA Tesla V100 GPUs (32 GB each), and 512 GB system RAM. Further architectural details are provided in the Supplementary Information A.4.
To evaluate the noise resilience of our OCiC system, we assessed its robustness under controlled noise injection. Noise arises primarily from two sources: distortions in the convolutional optical filter and AWGN arising throughout the optical and electronic subsystems. The former perturbs convolution kernels, whereas the latter corrupts feature maps at each layer. The results are shown in Extended Data Fig. 9.
For kernel noise, given a 3×3 kernel \(\mathbf{K}\), the perturbed kernel is defined as \[\begin{align} \mathbf{K}_n = \mathbf{K} + \mathbf{N} \end{align}\] where \(\mathbf{N} \sim \mathcal{N}(0, \sigma^2 I)\), \(\sigma\) denotes the kernel noise level, and \(\mathbf{N} \in \mathbb{R}^{3 \times 3}\).
For feature-map noise, given a convolutional feature map F of shape \([b, c, h, w]\), the perturbed feature map is \[\begin{align} \mathbf{F}_n = \mathbf{F} + \hat{\mathbf{N}} \end{align}\] where \(b\) is the batch size, \(c\) the number of channels, and \(h\) and \(w\) are the spatial dimensions. Here \(\hat{\mathbf{N}} \sim \mathcal{N}(0, \hat{\sigma}^2 I)\), \(\hat{\sigma}\) denotes the feature noise level and \(\hat{\mathbf{N}} \in \mathbb{R}^{b \times c \times h \times w}\).
Having defined the two noise types, we next evaluated their impact. As shown in Extended Data Fig. 9, both models exhibit similar resilience to feature noise, with the deep segmentation network maintaining at least 95% of its original accuracy until the noise standard deviation reaches 1. For the MNIST classifier, performance drops below 95% of baseline when the feature noise standard deviation approaches 0.95. In contrast, kernel noise has a substantially greater impact across network depths, with the effect amplified in the 161-layer surgical segmentation model compared to the 1-layer MNIST classifier. To preserve 95% accuracy in the segmentation model, the kernel noise standard deviation must remain below 0.026, corresponding to only a 0.7% performance degradation in the MNIST classifier. This discrepancy arises because feature noise is largely independent across layers, whereas kernel noise directly perturbs the convolution process and accumulates progressively throughout the network. Such stringent noise tolerance has seldom been addressed in prior optical computing studies, highlighting the importance of stable and precise kernel assignment for scalable optical computing.
The authors declare that the data underlying this study are provided within the paper and its accompanying supplementary materials.
The MNIST handwritten digits dataset is available at https://git-disl.github.io/GTDLBench/datasets/mnist_datasets/.
The dataset used for segmentation of coronary arteries in x-ray angiogram is available via web page (http://personal.cimat.mx:8181/~ivan.cruz/DB_Angiograms.html).
The supplementary video of the comb stability measurement for the NDFF deployment can be found via (https://drive.google.com/file/d/1PreiWbMKAI9J2nq43XG8fZdBkIGRxHoM/view?usp=drive_link).
The code for this paper can be available from the authors on request.
This work was supported by the National Dark Fibre Facility (NDFF) funded by the Engineering and Physical Sciences Research Council [grant number EP/S028854/1] in the United Kingdom. J.-Q. Z. acknowledges the Kennedy Trust Prize Studentship (AZT00050-AZ04) and the Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Science (CIFMS), China (grant number: 2024-I2M-2-001-1).
R.Y. and J.Z. conceived the OCiC architecture. R.Y. led the photonic experiments, including design, simulation, construction and testing, assisted by Y.L. J.H. developed the AI network and performed training, with support from J.Z. and Z.W.; J.Z. also conducted data curation. Q.R. and M.T. optimised the optical system design. J.C. and D.Z. contributed clinical and robotics expertise. J.H. and J.Z. carried out data analysis and visualisation. R.Y., Y.L., Y.Y., J.E.W. and X.L. prepared and deployed the NDFF experiments. R.Y., J.H., J.Z., Y.Y. and J.E.W. prepared figures and drafted the manuscript. All authors reviewed and edited the manuscript. C.L., H.L. and C.M. supervised the project.