Abstract

Most existing learning-based methods for solving imaging inverse problems can be roughly divided into two classes: iterative algorithms, such as plug-and-play and diffusion methods leveraging pretrained denoisers, and unrolled architectures that are trained end-to-end for specific imaging problems. Iterative methods in the first class are computationally costly and often yield suboptimal reconstruction performance, whereas unrolled architectures are generally problem-specific and require expensive training. In this work, we propose a novel non-iterative, lightweight architecture that incorporates knowledge about the forward operator (acquisition physics and noise parameters) without relying on unrolling. Our model is trained to solve a wide range of inverse problems, such as deblurring, magnetic resonance imaging, computed tomography, inpainting, and super-resolution, and handles arbitrary image sizes and channels, such as grayscale, complex, and color data. The proposed model can be easily adapted to unseen inverse problems or datasets with a few fine-tuning steps (up to a few images) in a self-supervised way, without ground-truth references. Throughout a series of experiments, we demonstrate state-of-the-art performance from medical imaging to low-photon imaging and microscopy. Our code is available at https://github.com/matthieutrs/ram.

1 Introduction↩︎

Computational imaging problems are ubiquitous in applications ranging from demosaicing in computational photography to Magnetic Resonance Imaging (MRI). In this work, we focus on linear inverse problems of the form \[\boldsymbol{y}\sim p(\boldsymbol{y}|\boldsymbol{A}\boldsymbol{x}), \label{eq:inv95pb}\tag{1}\] where \(\boldsymbol{x}\in \mathbb{R}^n\) is the image to recover, \(\boldsymbol{y}\in\mathbb{R}^m\) are measurements, \(\boldsymbol{A}\colon \mathbb{R}^n\to\mathbb{R}^m\) is an operator describing the acquisition physics, usually assumed to be linear, and \(p\) is the noise distribution, which can typically be Gaussian and/or Poisson noise. We mostly focus on non-blind problems with known \(\boldsymbol{A}\).

Figure 1: Proposed architecture for solving non-blind imaging inverse problems. Top row: The architecture builds upon a DRUNet backbone, originally designed with convolutional and residual blocks, but is enhanced to integrate knowledge about the measurement operator \boldsymbol{A} and measurements \boldsymbol{y}. Bottom row: At each scale, feature maps are decoded into the image domain, processed through a Krylov subspace module (KSM), and then re-encoded. The encoding/decoding module consists of a simple residual convolutional block. The KSM blocks concatenate power iterations of the scaled measurement operator \boldsymbol{A}_s^\top \boldsymbol{A}_s, enabling efficient and adaptable processing for a wide range of inverse problems. — Figure 1: **Proposed architecture for solving non-blind imaging inverse problems.** *Top row:* The architecture builds upon a DRUNet backbone, originally designed with convolutional and residual blocks, but is enhanced to integrate knowledge about the measurement operator \(\boldsymbol{A}\) and measurements \(\boldsymbol{y}\). *Bottom row:* At each scale, feature maps are decoded into the image domain, processed through a Krylov subspace module (KSM), and then re-encoded. The encoding/decoding module consists of a simple residual convolutional block. The KSM blocks concatenate power iterations of the scaled measurement operator \(\boldsymbol{A}_s^\top \boldsymbol{A}_s\), enabling efficient and adaptable processing for a wide range of inverse problems.

The most common approach to solving the inverse problem 1 is to employ an iterative algorithm to minimize an objective function of the form \[\mathop{\mathrm{arg\,min}}_{\boldsymbol{x}}\, \frac{1}{2}\|\boldsymbol{A}\boldsymbol{x}-\boldsymbol{y}\|_2^2 + g(\boldsymbol{x}), \label{eq:min95pb}\tag{2}\] where \(g\) encodes prior knowledge about the data. Classical approaches rely on handcrafted priors such as total variation [1] or wavelet sparsity [2]. However, recent advances have highlighted the effectiveness of learned priors, particularly deep denoising neural networks, which have been successfully integrated into iterative reconstruction methods. These include Plug-and-Play (PnP) algorithms [3]–[7], and the more recent diffusion models-based methods [8]–[11]. The advantage of these approaches is that a single model trained only for denoising can be used to solve a large set of image restoration tasks. However, these methods come with several major drawbacks: they are often slow, may introduce blur [12]–[14], and depend heavily on the training data, limiting their use in domains with scarce ground-truth references [15].

Another common strategy for solving 1 is to train an end-to-end deep neural network specifically for the given task. In low-level vision applications such as single-image super-resolution and image denoising, architectures are often derived empirically, with UNet-based models emerging as a standard choice [16], [17]. Notably, the UNet has also become a prevalent backbone for denoising tasks in generative modeling [18]–[20]. However, in computational imaging tasks (e.g., MRI, computed tomography, astronomical imaging), unrolled architectures, inspired by optimization algorithms, are typically preferred due to their ability to incorporate the measurement operator, \(\boldsymbol{A}\), and observed data [21]–[23]. Yet, the architecture of unrolled models strongly differs from state-of-the-art UNets used for low level tasks. In particular, empirical results [16], [17], [24], [25] suggest that effective architectures need not strictly adhere to optimization-derived operations. In both cases, however, even minor architectural modifications—such as adjusting the number of input and output channels—often require full model retraining. This reduces adaptability across datasets and inverse problems with similar statistical properties, such as transitioning from grayscale to color images or handling complex multi-channel representations [26].

In this work, we propose a novel neural network architecture for solving 1 , building on the DRUNet convolutional neural network [16]. Our key modification introduces a conditioning mechanism that integrates the acquisition physics \((\boldsymbol{A},p)\) through multigrid Krylov iterations. We train a single backbone model across multiple imaging tasks simultaneously ranging from image deblurring to MRI, handling datasets with grayscale, color, and complex images, corrupted by both Gaussian and Poisson noise. As a result, our architecture supports grayscale (1 channel), complex-valued (2 channels), and color images (3 channels), with only the output heads differing to accommodate specific channel numbers. Beyond its strong zero-shot performance, we demonstrate that the model can be efficiently fine-tuned in a self-supervised way on real measurement data.

2.0.0.1 End-to-end architectures

Several architectures have imposed themselves as cornerstone in the low-level vision community. Early breakthroughs include the UNet [27], [28] and deep residual convolutional networks [29]–[31]. More recently, architectures incorporating attention mechanisms have surpassed previous state-of-the-art models [17], [25], [26]. Notable recent developments include transformer-based models, particularly in single-image super-resolution [25], [26], and, more recently, state-space models [32]. Despite these advances, the UNet remains a key backbone for many state-of-the-art methods, see e.g. [17], [25], incorporating transformer blocks on UNet-like architectures.

2.0.0.2 Unrolled architectures

To better condition on the acquisition physics, unrolled networks introduce learnable neural modules (e.g., convolutional layers) within the iterations of an optimization algorithm solving 2 . Several architectures have become standard in this framework, notably in natural image processing [31], [33]–[35], medical imaging [21], [23], and astronomical imaging [36]. However, these models have large memory footprints and are typically trained on a single or limited set of tasks with limited variations. Moreover, there is a significant discrepancy between the overall architecture of these unrolled models and the highly effective architectures employed by the previously discussed state-of-the-art UNet-like architectures.

2.0.0.3 Self-supervised learning

In many scientific and medical imaging applications, obtaining ground-truth data for supervised training is often expensive or even impossible [15]. Self-supervised methods enable training directly from measurements alone [37]–[39]. In this work, we show that self-supervision is highly effective for finetuning a foundation model without ground-truth references.

2.0.0.4 Foundation image restoration models

Recently, various papers have proposed image restoration models that can handle multiple tasks using the same underlying weights [40]–[42]. While these works share a similar goal of building foundation models, they differ from this paper in that they focus on (semi) blind image-to-image tasks, such as denoising, deraining, and dehazing. In contrast, the proposed model is designed to handle scientific and medical imaging tasks where the measurements are not necessarily in the image space and the observation model 1 is typically known, such as in MRI (k-space data) or CT (sinograms).

3 Proposed architecture↩︎

The proposed architecture, illustrated in 1, is derived from the DRUNet [16]. In this section, we detail the proposed architectural modifications to handle general inverse problems. First, we add a proximal estimation module casting the input into a Gaussian-noise-corrupted form. In subsequent layers, the conditioning is performed via Krylov subspace iteration on the feature maps. Finally, we adapt the architecture in order to handle a variety of noise models.

3.0.0.1 Initialization with a proximal estimation module

In order to solve 1 for a wide variety of inverse problems, a first step is to map \(\boldsymbol{y}\) onto the appropriate image domain. Several approaches are used in the literature, using either \(\boldsymbol{A}^\dagger \boldsymbol{y}\) as the input of the network or \(\boldsymbol{A}^\top \boldsymbol{y}\). The first approach shows the advantage of inverting the ill-conditioned \(\boldsymbol{A}\), turning the effective problem for subsequent backbone layers as a Gaussian denoising problem. However, this pseudo-inverse operation is extremely sensitive to noise. In noisy settings, the latter approach is often chosen, at the cost of blurring of the input. Instead, letting \(f(\boldsymbol{x}, \boldsymbol{y}) = \frac{1}{2}\|\boldsymbol{A}\boldsymbol{x}-\boldsymbol{y}\|_2^2\), our first estimation step consists in a proximal step \[\operatorname{prox}_{\lambda f}(\boldsymbol{A}^\top\! \boldsymbol{y}) = \mathop{\mathrm{arg\,min}}_{\boldsymbol{u}} \lambda\|\boldsymbol{A}\boldsymbol{u}-\boldsymbol{y}\|^2 + \|\boldsymbol{u}-\boldsymbol{A}^\top\!\boldsymbol{y}\|^2, \label{eq:prox95input}\tag{3}\] where \(\lambda>0\) is a regularization parameter. Equation 3 can be seen as a middle-ground between \(\boldsymbol{A}^\top \boldsymbol{y}\) (when \(\lambda\) is low) and \(\boldsymbol{A}^\dagger \boldsymbol{y}\) (when \(\lambda\) is large). We set \(\lambda\) proportional to the input signal-to-noise ratio (SNR) as \({\lambda = \sigma\eta/\|\boldsymbol{y}\|_1}\) where \(\sigma\) is the Gaussian noise level and \(\eta\) is a learnable parameter.

3.0.0.2 Krylov subspace module

Given an intermediate estimate of the image \(\boldsymbol{x}^{\ell}\), unrolled network architectures condition on the acquisition physics \(\boldsymbol{A}\) via a gradient step on an \(L^2\) distance data-fidelity term, i.e. \[\boldsymbol{x}^{\ell+1} = \boldsymbol{x}^{\ell} - \gamma \boldsymbol{A}^{\top}(\boldsymbol{A}\boldsymbol{x}^{\ell}-\boldsymbol{y}),\] or via proximal step, i.e. \[\boldsymbol{x}^{\ell+1} = \mathop{\mathrm{arg\,min}}_{\boldsymbol{u}} \gamma \|\boldsymbol{A}\boldsymbol{u}-\boldsymbol{y}\|^2 + \|\boldsymbol{u}-\boldsymbol{x}^{\ell}\|^2.\] for some stepsize \(\gamma\). The proximal step is commonly solved using \(K\) iterations of conjugate gradient, yielding a solution in the Krylov subspace spanned by \(\{ ( \boldsymbol{I}+\gamma\boldsymbol{A}^{\top}\boldsymbol{A})^{k}(\gamma\boldsymbol{A}^{\top}\boldsymbol{y}+ \boldsymbol{x}^{\ell})\}_{k=0}^{K}\), and both steps can be written as \[\boldsymbol{x}^{\ell+1} = \sum_{k=0}^K \alpha_k (\boldsymbol{A}^{\top}\boldsymbol{A})^{k}\boldsymbol{x}^{\ell} + \beta_k (\boldsymbol{A}^{\top}\boldsymbol{A})^{k}\boldsymbol{A}^{\top}\boldsymbol{y},\] for some scalar coefficients \(\{(\alpha_k,\beta_k)\}_{k=1}^{K}\). Thus, we propose to condition on \(\boldsymbol{A}\) using a Krylov Subspace Module (KSM), which learns the linear combination coefficients \(\{(\alpha_k,\beta_k)\}_{k=1}^{K}\). Given an intermediate latent representation, a decoding module first maps the features onto the image domain using a convolution layer. Then, it stacks \(\{\Big((\boldsymbol{A}^{\top}\boldsymbol{A})^{k}\boldsymbol{x}^{\ell}, (\boldsymbol{A}^{\top}\boldsymbol{A})^{k}\boldsymbol{A}^{\top}\boldsymbol{y}\Big)\}_{k=1}^{K}\) along channels and combines them through a \(3\times 3\) convolution. The output is finally added to the latent space via a convolutional encoding module, see [fig:krylov,fig:ram_arch] for more details.

Figure 2: Multiscale conditioning. Estimating the underlying image (here motion blur) is easier on a coarser grid than a fine grid, as the forward operator is more ill-posed in the latter case. — Figure 2: **Multiscale conditioning.** Estimating the underlying image (here motion blur) is easier on a coarser grid than a fine grid, as the forward operator is more ill-posed in the latter case.

3.0.0.3 Multiscale operator conditioning

For a fixed number of measurements \(m\), the finer the grid (i.e., larger \(n\)), the more ill-posed the inverse problem becomes, as the dimension of the nullspace of \(\boldsymbol{A}\) is lower bounded by \(n-m\). Thus, inspired by multigrid methods [43], we propose to condition our architecture on forward operators defined on coarse grids as: \[\boldsymbol{A}_s = \boldsymbol{A}\boldsymbol{U}_{s}\] where \(\boldsymbol{U}_s: \mathbb{R}^{\frac{n}{4^{s}}}\to \mathbb{R}^{n}\) is a \(s\times\) upsampling operator with a Kaiser-windowed sinc antialias filter [44]. 2 illustrates that the linear pseudoinverse is unstable on fine grids but remains stable on coarse grids. We normalize all operators to have unit norm at each scale, i.e., \(\|\boldsymbol{A}_s\|_2=1\) for all \(s\). Moreover, for many inverse problems, we can develop efficient implementations of \(\boldsymbol{A}_s^{\top}\boldsymbol{A}_s\in \mathbb{R}^{\frac{n}{4^{s}}\times \frac{n}{4^{s}}}\) and \(\boldsymbol{A}_s^{\top}\boldsymbol{y}\in \mathbb{R}^{\frac{n}{4^{s}}}\) which can be computed fully on the coarse grid, avoiding any expensive fine scale computations. For example, if \(\boldsymbol{A}\) represents a blur or an inpainting operation, we can simply downscale the kernel or the inpainting mask, respectively.

3.0.0.4 Noise conditioning & biases

Same as the original DRUNet [16], [45], conditioning on the noise level is implemented by concatenating constant feature maps, filled with the noise level, along the channel dimension. While a single noise map was added for Gaussian denoising in [16], we extend this strategy to two maps, corresponding to the parameters \(\sigma\) and \(\gamma\) of the Poisson–Gaussian noise model \[\label{eq:32PGmodel} \boldsymbol{y}= \gamma \boldsymbol{z}+ \sigma \boldsymbol{n},\tag{4}\] where \(\gamma \geq 0\) is the gain factor¹ associated to the Poisson noise \(\boldsymbol{z}\sim \mathcal{P}(\boldsymbol{x}/\gamma)\), and \(\sigma \geq 0\) is the standard deviation of the Gaussian component of the noise, with \(\boldsymbol{n}\sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{I})\). All biases are removed from our architecture to ensure scale equivariance with respect to both \(\sigma\) and \(\gamma\) components, enabling better generalization to unseen noise levels [16], [46].

3.0.0.5 Shared layers between imaging modalities

The modifications needed to handle inputs with different channel numbers (color, grayscale, or complex images) involve only a small subset of the model parameters. Specifically, only the first (input) and last (output) convolutional layers of the DRUNet, as well as the encoding and decoding blocks within the KSM modules, are adjusted to account for varying channel numbers. All other weights of the model are shared across modalities.

4 Training↩︎

4.1 Supervised training↩︎

We train our network simultaneously on \(G\) computational imaging tasks, spanning image restoration (deblurring, inpainting, Poisson-Gaussian denoising in both color and grayscale), single-coil MRI, CT, and others (see 6 for a complete list), leveraging the DeepInverse library [47]. Each task \(g\) is associated with a dataset \(\mathcal{D}_g\) = \(\{ \boldsymbol{x}_{i,g}\}_{i=1}^{N_g}\): LSDIR [48] for tasks involving natural images, LIDC-IDRI for CT images [49], and the fastMRI brain-multicoil dataset for MRI [24].

Furthermore, each task \(g\) can be framed as an inverse problem 1 for some \(\boldsymbol{A}\), and noise following the Poisson Gaussian distribution in 4 with varying gain \(\gamma\geq 0\) and standard deviation \(\sigma\geq 0\). Our model is trained in a supervised fashion to minimize the following task-wise loss: \[\mathcal{L}_g(\boldsymbol{\theta}, \boldsymbol{x}_{i,g}) = \mathbb{E}_{(\sigma_g, \gamma_g)} \mathbb{E}_{\boldsymbol{y}|\boldsymbol{x}_{i,g}}\,\omega_g\|\operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{y}, \boldsymbol{A}_g, \sigma_g, \gamma_g)-\boldsymbol{x}_{i,g}\|_1\] where \(\operatorname{R}_{\boldsymbol{\theta}}\) is the model to be trained, \(\boldsymbol{A}_g\) is the measurement operator of task \(g\), and \((\sigma,\gamma)\) are the noise parameters sampled from a distribution \(p(\sigma,\gamma)\). We choose the \(\ell_1\) over the standard \(\ell_2\) loss as it was empirically observed to obtain better test performance [50]. We consider a weighting parameter \(\omega_g = \|\boldsymbol{A}_g^\top\boldsymbol{y}\|_2/\sigma_g\) to ensure the training loss is balanced across different noise levels and tasks. The final training loss is obtained by summing over all tasks: \[\mathcal{L}(\boldsymbol{\theta}) = \sum_{g=1}^G \sum_{i=1}^{N_g} \mathcal{L}_g(\boldsymbol{\theta}, \boldsymbol{x}_{i,g}).\] This formulation allows the network to generalize across multiple imaging modalities by learning from diverse measurement operators and noise distributions.

For each of the training inverse problems, we extract a random image patch \(\boldsymbol{x}_{i,g}\) of size \((C, 128, 128)\) with \(C\in \{1,2,3\}\) to which we apply the measurement \(\boldsymbol{A}_g\). We use the pretrained weights of the DRUNet denoiser to initialize our model. The model is trained with a batch size of 16 per inverse problem considered and for a total of 200k steps. We use the Adam optimizer with a learning rate \(10^{-4}\), which is divided by \(10\) after 180k steps.

4.2 Self-supervised finetuning↩︎

Many real computational imaging applications have limited or no ground-truth reference data [15]. In such cases, the RAM model can be finetuned using measurement data alone, leveraging some of the recent advances in self-supervised learning for inverse problems [38], [51]–[54]. In particular, we can finetune our pretrained RAM model with a loss that only requires a dataset of noisy and/or incomplete observations \(\{\boldsymbol{y}_1,\dots,\boldsymbol{y}_N\}\): \[\label{eq:32self-sup} \mathcal{L}(\boldsymbol{\theta}) = \sum_{i=1}^{N} \mathcal{L}_{\textrm{MC}}(\boldsymbol{\theta}, \boldsymbol{y}_i) + \omega \, \mathcal{L}_{\textrm{NULL}}(\boldsymbol{\theta}, \boldsymbol{y}_i),\tag{5}\] where \(\omega>0\) is a trade-off parameter. The first term, \(\mathcal{L}_{\textrm{MC}}\), enforces measurement consistency \(\boldsymbol{A}\hat{\boldsymbol{x}}\approx \boldsymbol{A}\boldsymbol{x}\) while taking care of the noise, and is chosen according to the knowledge about the noise distribution of the finetuning dataset, i.e., using SURE [55] for fully-specified noise models, and UNSURE [54] or splitting losses [37] in the case of unknown or mispecified noise models. The second term, \(\mathcal{L}_{\textrm{NULL}}\), is required if \(\boldsymbol{A}\) is non-invertible and handles the lack of information in the nullspace of \(\boldsymbol{A}\), by leveraging information from multiple forward operators [39] and/or enforcing equivariance to a group of transformations [38]. More details are provided in 11.1.

		CBSD68			Urban100			Div2K
Method	Params	Easy	Med.	Hard	Easy	Med.	Hard	Easy	Med.	Hard
DPIR	32M	33.45	26.53	24.42	34.39	28.12	24.38	36.32	29.97	27.16
DiffPIR	552M	31.83	25.98	23.80	32.93	27.03	23.69	-	-	-
DDRM\(^\text{*}\)	552M	33.10	27.97	25.41	34.19	29.65	26.05	35.19	30.57	27.70
PDNet	456K	30.54	26.84	24.62	28.36	25.22	22.67	32.61	28.81	26.32
Restormer	26M	30.96	26.99	25.00	28.21	24.75	23.13	28.56	25.75	25.00
uDPIR/t	32M	33.78	28.20	25.61	33.07	28.73	25.42	35.79	30.76	27.91
uDPIR/u	256M	33.64	27.90	25.54	32.72	28.14	25.28	35.38	30.17	27.77
RAM	36M	34.04	28.22	25.64	33.61	28.63	25.30	36.18	30.72	27.89

Table 1: Results on motion (left) and Gaussian (right) blur. Best result in red - second best in blue. Results in bold indicate iterative methods that outperform the proposed method. \(^*\)DDRM assumes access to an SVD decomposition of the operator; for fairness, we adapt the blur problem to allow FFT-based convolution.
CBSD68			Urban100			Div2K
Easy	Med.	Hard	Easy	Med.	Hard	Easy	Med.	Hard
32.10	25.70	22.73	32.53	23.78	20.38	35.23	28.14	24.71
30.93	24.96	22.51	31.17	23.63	20.32	-	-	-
31.85	25.89	23.14	32.73	24.40	20.73	34.46	28.25	24.89
29.88	25.26	22.55	27.85	22.81	19.90	32.01	27.29	24.03
31.68	25.93	23.19	29.07	24.00	20.99	30.01	27.20	25.00
31.90	26.10	23.41	30.47	24.31	21.05	34.63	28.45	25.20
31.86	26.03	23.38	30.26	24.32	21.04	34.47	28.37	25.23
32.59	26.19	23.42	31.78	24.65	21.12	35.23	28.56	25.16

Figure 3: Deblurring results. Top row: motion blur hard, on a CBSD68 sample. Bottom row: Gaussian blur medium, on a DIV2K sample. — Figure 3: **Deblurring results.** Top row: motion blur hard, on a CBSD68 sample. Bottom row: Gaussian blur medium, on a DIV2K sample.

5 Results↩︎

5.1 Baselines↩︎

We evaluate our model’s performance alongside both iterative reconstruction methods, unrollled models and end-to-end architectures. First, we consider the Plug-and-Play model DPIR [33], which plugs a DRUNet, trained only for denoising, within an 8-step Half Quadratic Splitting (HQS) algorithm [56], thereby requiring approximately eight times as many FLOPs as our model (see 8 for detail). We also compare with unrolled versions of DPIR (referred to as uDPIR [33]), also comprising eight HQS iterations, that we train on the same tasks, with the same loss function and dataset as our model. Specifically, we investigate both the weight-tied variant, where all eight iterations share the same DRUNet parameters, and the weight-untied variant, where each iteration has its own set of parameters—resulting in a model with eight times more parameters and FLOPs than RAM. Additionally, we compare our approach with the PDNet unrolled network [21], trained on the same tasks. Finally, we evaluate Restormer [57], a transformer-based model with a similar computational cost (in terms of FLOPs) to DRUNet, but trained separately on each restoration tasks. We also compare the proposed model with the diffusion-based DDRM algorithm [58] with the diffusion model backbone from [59]; see 10 for technical details on adaptation of DDRM to the proposed setting.

5.2 In-distribution results↩︎

We first evaluate the proposed method on tasks that are consistent with the training tasks.

Deblurring We evaluate two types of deblurring: Gaussian deblurring, using fixed Gaussian blur kernels, and motion deblurring, based on random motion blur kernels. For each, we define three difficulty levels (easy, medium, hard), determined by kernel characteristics and the Gaussian noise standard deviation. Details for each setup are provided in 9.2. 3 and 8 report visual and quantitative results. Among non-iterative methods, RAM and uDPIR-tied achieve the best performance, while uDPIR-untied underperforms despite its larger parameter count. Our method generally surpasses other end-to-end approaches, except for motion deblurring on Urban100. Compared to iterative methods, our approach is superior except at low noise levels (\(\sigma=0.01\)) on large images (Urban100, DIV2K).

MRI We evaluate on the fastMRI brain validation set using the single-coil accelerated acquisition procedure from [24] with acceleration factors \(4\) and \(8\). The problem is formulated as \(\boldsymbol{y}= \text{diag}(\boldsymbol{m})\boldsymbol{F}\boldsymbol{x}+ \sigma \boldsymbol{n}\), where \(\boldsymbol{m}\) is a binary mask and \(\boldsymbol{F}\) the discrete Fourier transform, and \(\sigma=5\cdot 10^{-4}\). Quantitative results are reported in [tab:metrics95mri95ct] and visuals in 6. The proposed RAM model outperforms all baselines.

CT We evaluate on computed tomography with Gaussian noise, modeled as \(\boldsymbol{y}= \boldsymbol{A}\boldsymbol{x}+ \sigma \boldsymbol{n}\), where \(\boldsymbol{A}\) is the Radon transform. The acquisition uses 51 angles, i.e., about \(10\%\) of the LIDC-IDRI image width, with \(\sigma = 10^{-4}\) as in training. Results are reported in the rightmost column of [tab:metrics95mri95ct] and visuals in 7. DPIR results are omitted due to instability on this task. The proposed RAM model produces reconstructions with finer details than uDPIR-tied.

Additional results for in-distribution tasks on Poisson-Gaussian denoising, motion blur, super-resolution, and inpainting are provided in 13.

5.3 Zero-shot performance on out-of-distribution tasks↩︎

5.3.0.1 Multi-coil MRI

We consider the Cartesian multi-coil MRI problem. In this setup, and assuming \(L\) coils, the signal measured by each coil \(\ell \in \{1,\cdots, L\}\) writes as \(\boldsymbol{y}_\ell = \text{diag}(\boldsymbol{m})\boldsymbol{F}\text{diag}(\boldsymbol{s}_\ell)\boldsymbol{x}+ \sigma \boldsymbol{n}_\ell\), where the \((\boldsymbol{s}_\ell)_{1\leq \ell \leq L}\) are the sensitivity maps (or S-maps). We provide reconstruction metrics for the multi-coil setting in [tab:metrics95mri95ct] and associated visuals in 7. We provide comparisons with the baseline UNet from [24]. Both methods perform similarly up to the addition of mild residual noise with the UNet, yielding lower PSNR despite similar visual results. We stress that since PDNet contains learnable layers acting in the measurement domain, one would need to retrain the architecture for this new setting specifically, hence we do not present the results.

5.3.0.2 Computed Tomography with Poisson noise

While our model was trained on the CT problem with Gaussian noise, we propose to apply it to a CT problem degraded with Poisson noise only. As previously, we consider a problem with 51 angles. Numerical results are provided in [tab:metrics95mri95ct] and visual results in 7. The proposed method performs better at estimating the textures.

Additional results for out-of-distribution tasks are provided in 13.

5.4 Self-supervised finetuning on out-of-distribution tasks↩︎

Real imaging problems may involve inverse problems or images that significantly differ from those at train time. In this case, finetuning with only a handful of measurements (i.e., without ground-truth data), sometimes a single one, substantially improves performance. We present results for three different tasks in 4 and 3. In all cases, finetuning with fewer than \(N=10\) measurements results in performances that significantly outperform PnP and diffusion baselines. Moreover, finetuning takes a few minutes on a single consumer-grade GPU (see 7).

Figure 4: Self-supervised finetuning. RAM can be finetuned on a few real LinoSPAD or Cryo-EM images without ground-truth nor full knowledge of the noise distribution. Adaptation to new problems (compressed sensing and demosaicing) and new distributions (Sentinel 2 images), by finetuning on measurements of a single image. — Figure 4: **Self-supervised finetuning.** RAM can be finetuned on a few real LinoSPAD or Cryo-EM images without ground-truth nor full knowledge of the noise distribution. Adaptation to new problems (compressed sensing and demosaicing) and new distributions (Sentinel 2 images), by finetuning on measurements of a single image.

Table 2: **Effect of finetuning dataset size and self-supervision**. RAM requires a few observations to obtain good results, whereas finetuning a DRUNet denoiser requires significantly larger datasets. We report average test PSNR in dB.
	Compressed Sensing							Demosaicing
2-8 (l)9-16	DPIR+	DRUNet rand		DRUNet		RAM		DPIR+	DDRM	DRUNet rand		DRUNet		RAM
		Sup.	Self.	Sup.	Self.	Sup.	Self.			Sup.	Self.	Sup.	Self.	Sup.	Self.
zero-shot	31.29	8.07		10.23		22.50		32.84	30.64	8.10		10.73		29.49
\(N=1\)		20.92	21.20	19.49	19.44	30.77	30.40			23.14	21.28	25.82	25.57	34.46	33.89
\(N=10\)		22.59	22.00	25.99	24.74	32.52	32.29			24.85	21.85	33.14	31.38	35.03	34.73
\(N=100\)		22.73	22.24	28.38	30.58	33.40	33.57			25.84	23.22	34.06	34.56	35.26	35.10

Compressed sensing on Sentinel-2 data: on 100 Sentinel-2 RGB images (\(128\times128\), 10m resolution), we set \(\boldsymbol{A}=\boldsymbol{S} \text{diag}(\boldsymbol{m})\) with \(\boldsymbol{m}\) a random \({\pm1}\) mask and \(\boldsymbol{S}\) a subsampled sine transform (factor 4). With Gaussian noise \(\sigma=0.05\), RAM finetuning achieves strong results from a single image, clearly surpassing DRUNet. Cryo electron microscopy: we use 5 noisy Cryo-EM micrographs (\(7676 \times 7420\)) from Topaz-EM [60], where the SNR is very low. Since the noise distribution is unknown, we finetune with a splitting loss. Low-photon imaging: we use 9 LinoSPAD images [61] (\(256\times 256\)). While noise is generally assumed to be Poisson in the low-flux case, the images were acquired on high-flux conditions and follow an unknown discrete noise distribution. Due to missing pixel lines, recovery amounts to inpainting and denoising. We finetune with a splitting and multi-operator losses (removing lines at random). Further implementation details are provided in 11.2.

6 Ablations, uncertainty quantification, and limitations↩︎

Architectural ablations 5 shows training PSNRs obtained on all training tasks after a fixed budget of 10K iterations for various architectural variations. We first note that introducing a proximal input block leads to a substantial improvement (+0.8dB). Conditioning the inner feature maps on the observation \(\boldsymbol{y}\) provides a smaller but consistent improvement (+0.1dB). The largest performance boost (+0.9dB) comes from incorporating our Krylov blocks, which apply \(\boldsymbol{A}^\top \boldsymbol{A}\) operations to the feature maps. This highlights that mimicking the core operations of iterative optimization methods within the architecture yields significant benefits. Interestingly, our model surpasses unrolled architectures that are explicitly derived from optimization algorithms.

Single vs multitask training 5 shows that given appropriate conditioning on the measurement operator, bias towards specific training tasks is negligible, as the model trained on all tasks shows a performance comparable (yet slightly inferior to) models trained on specific tasks only. The proposed reweighting using the SNR of the considered problem limits bias towards harder tasks.

4pt

Configuration	PSNR (dB)	#Params
base (DRUNet backbone)	25.83	32.6M
base + prox	26.64	32.6M
base + prox + embed \(y\)	26.71	33.4M
base + prox + embed \(y\) + Krylov (RAM)	27.61	35.6M

8pt

Table 3: **Ablations**. Left: Architectural ablations. Average training PSNR on the training set for different architecture variants. Right: Training task ablation. Performance of models trained on different task subsets on the validation set.
Training tasks	Inpainting	Deblurring	SR (\(\times\)2)
Inpainting only	30.84	23.03	26.17
Deblurring only	14.06	25.62	28.02
SR\(\times\)2, \(\sigma=0\) only	n/a	21.25	28.60
All three tasks	30.73	25.58	29.79

Figure 5: Uncertainty quantification. Equivariant bootstrapping with RAM provides estimates of pixelwise errors that are close to the true errors. — Figure 5: **Uncertainty quantification.** Equivariant bootstrapping with RAM provides estimates of pixelwise errors that are close to the true errors.

Uncertainty quantification We can use existing uncertainty quantification (UQ) algorithms that only require a general reconstruction function. 5 shows estimated errors using the recent equivariant bootstrap technique [62] (see 15 on a noisy image inpainting task (DIV2K validation images with \(\sigma=0.02\) and 50% observed pixels). More details about the bootstrapping algorithm and additional results are included in 14. As shown in 16, our algorithm obtains better-calibrated uncertainty estimates than DDRM, while requiring \(100\times\) fewer network evaluations.

Limitations Training the RAM model requires a fair amount of GPU resources, which might not be available to all practitioners. Nevertheless, our model can be finetuned on a single mid-sized GPU, obtaining competitive results with a few optimization steps on small finetuning datasets. We focus on low-distortion reconstructions, in contrast to the higher perceptual quality and higher distortion of diffusion methods [12]. However, reconstructors such as RAM can be used within sampling algorithms, e.g. [63], [64].

7 Conclusion↩︎

We present a new lightweight foundational model for computational imaging that achieves state-of-the-art performance on a wide range of problems, outperforming PnP methods while providing similar results to \(8\times\) more compute and parameter-intensive unrolled networks. Our results demonstrate competitive performances without following the common practice of unrolling an optimization algorithm, resulting in a faster and lighter reconstruction network.

Our model displays transfer capabilities, as it can be finetuned on new datasets or imaging problems with small datasets (up to a single image) in a fully self-supervised way, i.e., without any ground-truth references. These results challenge the idea that a reconstruction network has to be specialized to a specific imaging task, showing good reconstructions across a wide variety of imaging modalities with a relatively lightweight network (36M parameters). Our results suggest that imaging tasks that might seem very different (e.g. cryo-EM denoising and demosaicing of satellite images) share significant common structure and can be tackled with the same model.

We believe this work paves the way for a new approach to imaging, where most of the effort can be put into developing strong base models and robust self-supervised finetuning techniques.

8 Pre-processing of Training Datasets↩︎

8.0.0.1 LSDIR

We use the LSDIR dataset [48] for training on natural imaging tasks, consisting of 84,991 high quality images. We split the images in 512\(\times\)512 non-overlapping patches for faster data loading. Beyond this, no additional pre-processing is performed on the dataset.

8.0.0.2 MRI

We perform virtual coil-combination on the raw kspace data from the brain-multicoil fastMRI dataset [24]. We use the resulting 70,748 complex images from the training set for training our model and the 21,842 slices from the validation set are reserved for validation.

8.0.0.3 LIDC-IDRI

We use the LIDC-IDRI dataset [49] as a basis for our computed tomography experiments containing 244,526 chest slices. We normalize the scans in Houndsfield Units (HU) using the rescale slope and intercept provided in the Dicom files. Data is then clipped in the lung window values [-1200, 800] and the rescaled in [0, 1]. We perform random extraction of slices from the dataset to provide a (95%, 4%, 1%) split of (training, validation, test).

9 Considered Inverse Problems↩︎

9.1 Training Inverse Problems↩︎

We provide in 6 a detailed list of all datatets and inverse problems considered for the training of RAM.

Table 4: Summary of training datasets and inverse problems. Noise parameters: when both \(\sigma_{\min}\) and \(\sigma_{\max}\) are specified, \(\sigma_g \sim \mathcal{U}(\sigma_{\min}, \sigma_{\max})\); when only \(\sigma_{\max}\) is given, \(\sigma_g = \sigma_{\max}\). Same notation applies to Poisson noise parameters \(\gamma_g\). \(^*\)for fastMRI, the effective noise levels need to be rescaled by a factor \(5\cdot 10^{-3}\) due to a rescaling of the full dataset. “-” indicates no noise of that type.
:===========: 5-8	Task :===============	Dataset :============	Configuration :===============================	Noise Distribution :===============:+:===============:+:===============:+:===============: Gaussian \| Poisson
:===========: 5-8	Task :===============	Dataset :============	Configuration :===============================
5-6 (lr)7-8				\(\sigma_{\min}\)	\(\sigma_{\max}\)	\(\gamma_{\min}\)	\(\gamma_{\max}\)
	Denoising	LSDIR	-	0.001	0.2	0.01	1.0
	Deblurring	LSDIR	Random motion & Gaussian kernels (31×31)	0.001	0.2	-	-
	Inpainting	LSDIR	Bernoulli masks: \(p \sim \mathcal{U}(0.3, 0.9)\)	0.01	0.2	0.01	1.0
	SR\(\times\)2,4	LSDIR	Bicubic/bilinear downsampling (\(\times\)2, \(\times\)4)	0.001	0.01	-	-
	Tomography	LIDC-IDRI	Radon transform, 10 projection angles	-	0.01	-	-
	Denoising	LSDIR	-	0.001	0.2	0.01	1.0
	Deblurring	LSDIR	Random motion & Gaussian kernels (31\(\times\)31)	0.01	0.2	-	-
	Inpainting	LSDIR	Bernoulli masks: \(p \sim \mathcal{U}(0.3, 0.9)\)	0.01	0.2	0.01	1.0
	SR\(\times\)2,4	LSDIR	Bicubic/bilinear downsampling (\(\times\)2, \(\times\)4)	0.001	0.01	-	-
	Pan-sharpening	LSDIR	Flat spectral response	0.01	0.1	-	-
	MRI	FastMRI	Acceleration masks (\(\times\)4, \(\times\)8 undersampling)	0.001\(^*\)	0.1\(^*\)	-	-
	Denoising	FastMRI	-	0.001\(^*\)	0.1\(^*\)	-	-
	Inpainting	FastMRI	Bernoulli masks: \(p \sim \mathcal{U}(0.1, 0.9)\)	0.01\(^*\)	0.1\(^*\)	-	-

9.2 Blur tasks definition↩︎

The deblurring inverse problem is \(\boldsymbol{y}= \boldsymbol{k}*\boldsymbol{x}+\sigma\boldsymbol{n}\), where \(\boldsymbol{k}\) is some blur kernel and \(\boldsymbol{n}\sim\mathcal{N}(\boldsymbol{0}, I)\). We use a “valid” padding strategy, i.e., the image is not padded before convolving with the blur kernel. In each case, we establish 3 types of problems (easy, medium and hard) corresponding to different kernel lengths and noise standard deviations.

9.2.0.1 Motion deblurring

We generate random motion blur kernels as random Gaussian processes following [47], [65] of \(31\times 31 pixels\). In this case, the difficulty of the deblurring problem is defined by the length scale of blur trajectories \(\ell\), the standard deviation of the Gaussian processes \(s\) generating the psf and the standard deviation \(\sigma\) of the additive Gaussian noise. The tuple \(\{\ell, s, \sigma\}\) are set to \(\{0.1, 0.1, 0.01\}\), \(\{0.6, 0.5, 0.05\}\), \(\{1.2, 1.0, 0.1\}\) for the easy, medium and hard settings respectively.

9.2.0.2 Gaussian deblurring

We generate fixed blur kernels with psf of size 31; the difficulty of the problem is defined by the standard deviation of the (Gaussian) blur kernel \(\sigma_{\text{blur}}\) and the standard deviation \(\sigma\) of the additive Gaussian noise. The tuple \(\{\sigma_{\text{blur}}, \sigma\}\) is set to \(\{1.0, 0.01\}\), \(\{2.0, 0.05\}\), \(\{4.0, 0.1\}\) for the easy, medium and hard settings respectively.

10 Adaptation of DDRM↩︎

The DDRM algorithm builds on a denoising diffusion UNet backbone, following the architecture from [59], as in [58]. Due to its large receptive field, this convolutional UNet requires sufficiently large input sizes, but memory constraints make resolutions beyond 512\(\times\)512 impractical. To address this, we use circular padding to align inputs with the \(2^5\) minimum size constraint, and process larger images by dividing them into non-overlapping 512\(\times\)512 patches.

11 Self-Supervised Finetuning↩︎

11.1 Self-Supervised Losses↩︎

In this section, we review the possible choices for \(\mathcal{L}_{\textrm{MC}}\) and \(\mathcal{L}_{\textrm{NULL}}\) that can be used in 5 .

11.1.0.1 Measurement consistency

\(\mathcal{L}_{\textrm{MC}}\) can be selected based on prior knowledge of the noise distribution [54]. If the noise distribution is known exactly, Stein’s Unbiased Risk Estimate (SURE) [55] is used. For the case of Gaussian noise of level \(\sigma\), SURE is defined² as \[\mathcal{L}_{\textrm{SURE}}(\boldsymbol{\theta}, \boldsymbol{y}_i) = \|\boldsymbol{A}_i\operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{y}_i,\boldsymbol{A}_i)-\boldsymbol{y}_i\|_2^2 + 2\sigma_i^2 \text{div} (\boldsymbol{A}_i\circ \operatorname{R}_{\boldsymbol{\theta}})(\boldsymbol{y}_i, \boldsymbol{A}_i),\] where the divergence is approximated using a Monte Carlo method [66]. Extensions of SURE to unknown \(\sigma\) [54] and other noise distributions also exist [67]. If the noise distribution is only assumed to be separable across measurements, we can use a splitting loss \[\mathcal{L}_{\textrm{SPLIT}}(\boldsymbol{\theta}, \boldsymbol{y}_i) = \mathbb{E}_{\boldsymbol{m}}\| (\boldsymbol{I}-\text{diag}(\boldsymbol{m}))\left(\boldsymbol{A}_i\operatorname{R}_{\boldsymbol{\theta}}(\text{diag}(\boldsymbol{m})\boldsymbol{y}_i,\text{diag}(\boldsymbol{m})\boldsymbol{A}_i) - \boldsymbol{y}_i\right) \|_2^2,\] where \(\boldsymbol{m}\) is a splitting mask sampled from a Bernoulli random variable or using problem-specific strategies (e.g., Neighbor2Neighbor [52], SSDU [68], etc.).

11.1.0.2 Learning in the nullspace

If the finetuning dataset is observed via a single operator \(\boldsymbol{A}_i=\boldsymbol{A}\) for all \(i=1,\dots,N\), we choose \(\mathcal{L}_{\textrm{NULL}}\) as the Equivariant Imaging (EI) loss [38] which enforces equivariance to a group of transformations \(\{\boldsymbol{T}_r\}_{r\in\mathcal{R}}\) such as rotations or shifts with \[\mathcal{L}_{\textrm{EI}}(\boldsymbol{\theta}) = \mathbb{E}_{r\sim \mathcal{R}}\| \boldsymbol{T}_r\hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}} - \operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{A}\boldsymbol{T}_r\hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}}, \boldsymbol{A}) \|_2^2,\] where \(\hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}}=\operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{y}_i,\boldsymbol{A}_i)\) is the reconstructed image. To learn in the nullspace of \(\boldsymbol{A}\), the group of transformations should be chosen such that \(\boldsymbol{A}\) is not equivariant. If the finetuning dataset is associated with multiple operators \(\{\boldsymbol{A}_r\}_{r\in\mathcal{R}}\), we choose \(\mathcal{L}_{\textrm{NULL}}\) as the Multi Operator Imaging (MOI) loss [53], i.e., \[\mathcal{L}_{\textrm{MOI}}(\boldsymbol{\theta}) = \mathbb{E}_{r\sim \mathcal{R}}\| \hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}} - \operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{A}_r\hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}},\boldsymbol{A}_r) \|_2^2,\] also where \(\hat{\boldsymbol{x}}_{i,\boldsymbol{\theta}}=\operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{y}_i,\boldsymbol{A}_i)\), which enforces consistency across operators \(\operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{A}_r \boldsymbol{x}, \boldsymbol{A}_r) \approx \operatorname{R}_{\boldsymbol{\theta}}(\boldsymbol{A}_s \boldsymbol{x}, \boldsymbol{A}_s)\) for all pairs \((i,r)\).

11.2 Experimental details↩︎

In all finetuning experiments, we use the Adam optimizer with the same parameters used at train time (see 4.1). In all experiments and baselines, we choose the network checkpoint that obtains the best performance, using ground-truth for the Sentinel experiments and visual inspection for the SPAD and Cryo-EM data. The trade-off parameter in 5 is set as \(\omega=0.1\) in all experiments where a \(\mathcal{L}_{\textrm{NULL}}\) loss is used. Shifts up to \(10\%\) of the image width and height are considered in the EI loss. 4 shows reconstructions with finetuning on a single measurement on a compressed sensing problem, and [fig:cryo-em-appendix,fig:spad-appendix] show more reconstructions for the LinoSPAD and Cryo-EM datasets, respectively.

Table 5: **Self-supervised finetuning timings.** Experiments on Sentinel 2 data (Tab. 5 of main), using a single RTX 4090 GPU.
Dataset	Compressed Sensing	Demosaicing
\(1\) image	43 sec	60 sec
\(10\) images	80 sec	107 sec
\(100\) images	228 sec	267 sec

11.2.0.1 Satellite images

We use a dataset of 200 images of \(128\times 128\) pixels of the coast of the United Kingdom taken by the Sentinel-2 L1C satellite with 10-meter resolution and minimal cloud coverage, keeping only the RGB bands. The dataset is split into 100 images for training and 100 for testing.

Compressed sensing: we set \(\boldsymbol{A}= \boldsymbol{S}\text{diag}(\boldsymbol{m})\) where \(\boldsymbol{m}\) is a random mask with values in {-1,+1}, and \(\boldsymbol{S}\in\mathbb{R}^{m\times n}\) is the discrete sine transform wit output randomly subsampled by a factor of 4.
Demosaicing: measurements are generated by applying a Bayer pattern, keeping a single band per pixel. In both cases, we consider Gaussian noise with \(\sigma=0.05\), and finetune with the \(\ell_2\) loss for the supervised case, and with \(\mathcal{L}_{\textrm{SURE}}\) and \(\mathcal{L}_{\textrm{EI}}\) (using shifts) for the self-supervised setting. As shown in 3 and 4, finetuning RAM requires only on a single image to obtain good results, significantly outperforming the DRUNet baselines. As shown in 7, self-supervised finetuning can be performed in the order of a couple of minutes on a single mid-sized GPU.

11.2.0.2 Cryo electron microscopy

We evaluate the model on 5 real Cryo-EM images of \(7676 \times 7420\) pixels provided by the Topaz-EM open-source library [60] whose noise distribution is unknown and have very low SNR. We finetune our model with fixed \(\boldsymbol{A}=\boldsymbol{I}\), \(\sigma=0.98\), \(\gamma=0\) at the input, using \(256 \times 256\) crops normalized to have unitary standard deviation. Since the noise distribution is unknown and there is a trivial nullspace, we use \(\mathcal{L}_{\text{SPLIT}}\) for measurement consistency and \(\mathcal{L}_{\text{NULL}}=0\). Reconstructions are shown in 4.

11.2.0.3 Low-photon imaging

We evaluate the RAM model on the LinoSPAD data provided by [61], which is corrupted by photon noise. While noise in SPADs is generally assumed to be Poisson in the very low-flux case, images acquired with higher flux can follow more complex discrete noise distributions. The LinoSPAD scans the image using epipolar scanning, with some lines of pixels removed due to faulty acquisition. Thus, the image recovery problem in this case is inpainting. The missing line pattern and noise model were not used during model training. We finetune the model using \(\mathcal{L}_{\textrm{SPLIT}}\) and \(\mathcal{L}_{\textrm{MOI}}\) (removing random subsets of lines). Reconstructions are shown in 4.

Table 6: **Computational metrics.** Floating point operations (FLOPs) and memory requirements in MBytes for different methods on \(256 \times 256\) color image motion deblurring.
	PDNet	uDPIR-untied	uDPIR-tied	RAM
GFLOPS	60	2234	2234	360
Test memory (BS=1)	52	1213	298	354
Train memory (BS=8)	5815	37288	36374	9670

12 Additional results on medical imaging tasks↩︎

12.0.0.1 Single coil MRI

We provide in 6 visual results for MRI reconstructions on acceleration factors \(\times\)4 and \(\times\)8 respectively.

12.0.0.2 Multi-coil MRI

We provide in 8 results on the multi-coil MRI inverse problem where we simulate \(L=15\) coil maps. The UNet reconstruction shows a less smooth aspect, penalizing PSNR.

Figure 6: Results on MRI for acceleration factors 4 and 8. The Fourier mask is shown in the top left corner of the backprojection. — Figure 6: **Results on MRI for acceleration factors 4 and 8.** The Fourier mask is shown in the top left corner of the backprojection.

12.0.0.3 CT

We provide in 7 visual results for computed tomography for the in-distribution setup on the top row (i.e., with a setup similar to that of training), and the out-of-distribution setup on the bottom row. In the latter case, measurements are degraded with additional Poisson noise, unlike the training setup.

Figure 7: Results on CT. Top row: CT with Gaussian noise, similar to the training setup. Bottom row: CT with Poisson noise, unseen during training. — Figure 7: **Results on CT**. Top row: CT with Gaussian noise, similar to the training setup. Bottom row: CT with Poisson noise, unseen during training.

Figure 8: Results on the multi-coil MRI problem with acceleration factor 8. The UNet model is from [24].

13 Additional results on natural imaging tasks↩︎

Motion deblurring We provide additional visual results for the motion-blur (“hard”) task on DIV2K dataset samples in 9. We observe that the proposed method performs on par with uDPIR with tied weights and DDRM, although some mild visual differences may be noticed. See, for instance, the windows in the blue zoombox of the church image, which appear to show a repetitive pattern in DDRM that is not faithful to the ground-truth image.

Figure 9: Deblurring results on the “Motion hard” problem, on samples from the DIV2K dataset. PSNR and SSIM metrics are provided below each reconstruction. — Figure 9: **Deblurring results** on the “Motion hard” problem, on samples from the DIV2K dataset. PSNR and SSIM metrics are provided below each reconstruction.

Super-resolution We provide visual results for bicubic super-resolution with factor 4 and with additive Gaussian noise of standard deviation \(\sigma = 0.01\) on BSD68 dataset samples in Fig. 10. For this task, we also compare the proposed result with the state-of-the-art transformer-based SwinIR model [26] trained specifically for this degradation. We observe that the proposed method outperforms the DPIR algorithm, but underperforms compared to SwinIR.

Figure 10: Super-resolution results on the SR\times4 problem, on samples from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction. — Figure 10: **Super-resolution results** on the SR\(\times\)4 problem, on samples from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction.

Grayscale Poisson-Gaussian denoising We provide visual results for Poisson-gaussian denoising on a grayscale BSD68 dataset sample in Fig. 11. We follow the formulation from 4 . On both Poisson denoising and Gaussian denoising, we observe that the model is relatively robust to out-of-distribution parameters. In both cases, while the reconstruction quality degrades as the degradation level increases, low-frequency features are preserved in the reconstruction, and no substantial artefacts are introduced.

Figure 11: Grayscale Poisson-Gaussian denoising on a sample from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction. On the top row, we show input images and reconstruction for various \gamma values in the Poisson-Gaussian model 4 , with the Gaussian component fixed \sigma = 0.01. On the bottom row, we show input images and reconstruction for various \sigma values in the Poisson-Gaussian model 4 , with the Poisson component fixed \gamma = 0.01. — Figure 11: **Grayscale Poisson-Gaussian denoising** on a sample from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction. On the top row, we show input images and reconstruction for various \(\gamma\) values in the Poisson-Gaussian model 4 , with the Gaussian component fixed \(\sigma = 0.01\). On the bottom row, we show input images and reconstruction for various \(\sigma\) values in the Poisson-Gaussian model 4 , with the Poisson component fixed \(\gamma = 0.01\).

Image inpainting We provide visual results for inpainting with mild Poisson-Gaussian noise on a color BSD68 dataset sample in Fig. 12. We show results on inpainting masks applied either per-channel (top rows) or pixel-wise (bottom rows), for various masking levels. In both cases, Poisson-Gaussian noise is applied with \(\sigma = 0.01\) and \(\gamma = 0.01\). We observe that the model is more robust to out-of-distribution mask sparsity for channel-wise masks than for pixel-wise masks, where artifacts appear when fewer than \(20\%\) of pixels are observed.

Figure 12: Color inpainting with Poisson-Gaussian noise on a sample from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction. Top rows: inpainting with per-channel masking, for various sparsity factors m (m=30\% means that, for each channel, 30\% of the pixels are kept). Bottom rows: inpainting with pixel-wise masking, for various sparsity factors m (m=30\% means that 30\% of the pixels are kept). — Figure 12: **Color inpainting with Poisson-Gaussian noise** on a sample from the BSD68 dataset. (PSNR, SSIM) metrics are provided below each reconstruction. Top rows: inpainting with per-channel masking, for various sparsity factors \(m\) (\(m=30\%\) means that, for each channel, \(30\%\) of the pixels are kept). Bottom rows: inpainting with pixel-wise masking, for various sparsity factors \(m\) (\(m=30\%\) means that \(30\%\) of the pixels are kept).

Figure 13: Additional LinoSPAD finetuning results.

Figure 14: Additional Cryo-EM finetuning results.

14 Uncertainty quantification↩︎

We can obtain uncertainty estimates of the reconstructions obtained by RAM with the equivariant bootstrap algorithm [62] described in 15. This approach is similar to the standard parametric bootstrap [69], with the addition of a group of geometric transformations (set as random shifts, rotations, and flips in our experiments). The transforms are used to probe the confidence of the network on the reconstruction in the nullspace of the operator \(\boldsymbol{A}\).

We evaluate the uncertainty quantification algorithm on an inpainting task with \(50\%\) missing pixels and Gaussian noise with standard deviation \(\sigma=0.02\), using the DIV2K validation dataset. As a baseline, we use the DDRM diffusion method, running the diffusion multiple times to obtain a set of approximate posterior samples. We obtain 100 samples for each test image, which requires 101 network evaluations for equivariant bootstrap and 1000 evaluations for DDRM (since each diffusion requires 100 denoiser evaluations). Estimated pixelwise errors, averaging over all color channels, are computed using \(\boldsymbol{x}_{\textrm{err}}\) of 15 and shown in [fig:UQ,fig: UQ extra]. We can see in both figures that the estimated errors follow closely the true errors.

We also evaluate the accuracy of these confidence regions by calculating the empirical coverage probabilities, as measured by the proportion of ground-truth test images that lie within the confidence regions for a range of specified confidence levels between 0% and 100%. This method provides a quantitative metric of the estimated uncertainty intervals [70]. Results are shown in 16. The proposed algorithm obtains good coverage without any additional calibration step, while the diffusion method provides highly overconfident intervals (note this behavior of diffusion models has been observed in various previous works [62], [70]).

Figure 16: Empirical coverage results. Coverage of the proposed equivariant bootstrapping with RAM and of the posterior samples of DDRM [58]. A good coverage should follow the dotted line. The bootstrapping method provides significantly better uncertainty intervals than those computed using DDRM posterior samples. — Figure 16: **Empirical coverage results.** Coverage of the proposed equivariant bootstrapping with RAM and of the posterior samples of DDRM [58]. A good coverage should follow the dotted line. The bootstrapping method provides significantly better uncertainty intervals than those computed using DDRM posterior samples.

Figure 17: Additional uncertainty quantification results. Reconstructed images from DIV2K inpainting measurements with 50\% of observed pixels and Gaussian noise of \sigma=0.02. The estimated errors are computed using the equivariant bootstrap algorithm in 15. — Figure 17: **Additional uncertainty quantification results.** Reconstructed images from DIV2K inpainting measurements with \(50\%\) of observed pixels and Gaussian noise of \(\sigma=0.02\). The estimated errors are computed using the equivariant bootstrap algorithm in 15.

References↩︎

[1]

Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Phys. D, 60: 259–268, 1992.

[2]

Stéphane Mallat. A Wavelet Tour of Signal Processing, The Sparse Way. Academic Press, Elsevier, 3rd edition edition, 2009. ISBN 978-0-12-374370-1.

[3]

Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priors for model based reconstruction. In 2013 IEEE global conference on signal and information processing, pp. 945–948. IEEE, 2013.

[4]

Edward T Reehorst and Philip Schniter. Regularization by denoising: Clarifications and new interpretations. IEEE transactions on computational imaging, 5 (1): 52–67, 2018.

[5]

Jean-Christophe Pesquet, Audrey Repetti, Matthieu Terris, and Yves Wiaux. Learning maximally monotone operators for image recovery. SIAM Journal on Imaging Sciences, 14 (3): 1206–1237, 2021.

[6]

Johannes Hertrich, Sebastian Neumayer, and Gabriele Steidl. Convolutional proximal neural networks and plug-and-play algorithms. Linear Algebra and its Applications, 631: 203–234, 2021.

[7]

Samuel Hurault, Arthur Leclaire, and Nicolas Papadakis. Gradient step denoiser for convergent plug-and-play. arXiv preprint arXiv:2110.03220, 2021.

[8]

Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35: 25683–25696, 2022.

[9]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022.

[10]

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems. arXiv preprint arXiv:2410.00083, 2024.

[11]

Yuanzhi Zhu, Kai Zhang, Jingyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, and Luc Van Gool. Denoising diffusion models for plug-and-play image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (NTIRE), 2023.

[12]

Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6228–6237, 2018.

[13]

Peyman Milanfar and Mauricio Delbracio. Denoising: A powerful building-block for imaging, inverse problems, and machine learning. arXiv preprint arXiv:2409.06219, 2024.

[14]

Matthieu Terris, Thomas Moreau, Nelly Pustelnik, and Julian Tachella. Equivariant plug-and-play image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 25255–25264, 2024.

[15]

Chinmay Belthangady and Loic A. Royer. Applications, Promises, and Pitfalls of Deep Learning for Fluorescence Image Reconstruction. Nature Methods, 16: 1215–1225, February 2019. .

[16]

Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc Van Gool, and Radu Timofte. Plug-and-play image restoration with deep denoiser prior. TPAMI, 2021.

[17]

Kai Zhang, Yawei Li, Jingyun Liang, Jiezhang Cao, Yulun Zhang, Hao Tang, Deng-Ping Fan, Radu Timofte, and Luc Van Gool. Practical blind image denoising via swin-conv-unet and data synthesis. Machine Intelligence Research, 20 (6): 822–836, 2023.

[18]

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.

[19]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.

[20]

Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24174–24184, 2024.

[21]

Jonas Adler and Ozan Öktem. Learned primal-dual reconstruction. IEEE transactions on medical imaging, 37 (6): 1322–1332, 2018.

[22]

Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P Recht, Daniel K Sodickson, Thomas Pock, and Florian Knoll. Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine, 79 (6): 3055–3071, 2018.

[23]

Zaccharie Ramzi, GR Chaithya, Jean-Luc Starck, and Philippe Ciuciu. Nc-pdnet: A density-compensated unrolled network for 2d and 3d non-cartesian mri reconstruction. IEEE Transactions on Medical Imaging, 41 (7): 1625–1638, 2022.

[24]

Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, et al. fastmri: An open dataset and benchmarks for accelerated mri. arXiv preprint arXiv:1811.08839, 2018.

[25]

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5728–5739, 2022.

[26]

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. : Image restoration using swin transformer. In ICCV Workshops, 2021.

[27]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pp. 234–241. Springer, 2015.

[28]

Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep convolutional neural network for inverse problems in imaging. IEEE transactions on image processing, 26 (9): 4509–4522, 2017.

[29]

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. TIP, 2017.

[30]

Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481, 2018.

[31]

Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3929–3938, 2017.

[32]

Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for image restoration with state-space model. In European conference on computer vision, pp. 222–241. Springer, 2024.

[33]

Kai Zhang, Luc Van Gool, and Radu Timofte. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3217–3226, 2020.

[34]

Carla Bertocchi, Emilie Chouzenoux, Marie-Caroline Corbineau, Jean-Christophe Pesquet, and Marco Prato. Deep unfolding of a proximal interior point method for image restoration. Inverse Problems, 36 (3): 034005, 2020.

[35]

Yash Sanghvi, Abhiram Gnanasambandam, and Stanley H Chan. Photon limited non-blind deblurring using algorithm unrolling. IEEE Transactions on Computational Imaging, 8: 851–864, 2022.

[36]

Amir Aghabiglou, Matthieu Terris, Adrian Jackson, and Yves Wiaux. Deep network series for large-scale high-dynamic range imaging. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2023.

[37]

Joshua Batson and Loic Royer. : Blind Denoising by Self-Supervision. In Proceedings of the 36th International Conference on Machine Learning. PMLR, June 2019.

[38]

Dongdong Chen, Julián Tachella, and Mike E. Davies. Equivariant Imaging: Learning Beyond the Range Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4379–4388, 2021.

[39]

Julian Tachella, Dongdong Chen, and Mike Davies. Sensing Theorems for Unsupervised Learning in Linear Inverse Problems. Journal of Machine Learning Research (JMLR), 2023.

[40]

Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. PromptIR: Prompting for all-in-one image restoration. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=KAlSIL4tXU.

[41]

JiaKui Hu, Lujia Jin, Zhengjian Yao, and Yanye Lu. Universal image restoration pre-training via degradation classification. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=PacBhLzeGO.

[42]

Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, and Fahad Shahbaz Khan. AdaIR: Adaptive all-in-one image restoration via frequency mining and modulation. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=M5t0WvjfCg.

[43]

Wolfgang Hackbusch. Multi-grid methods and applications, volume 4. Springer Science & Business Media, 2013.

[44]

J Kaiser and R Schafer. On the use of the i 0-sinh window for spectrum analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28 (1): 105–107, 1980.

[45]

Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing, 27 (9): 4608–4622, 2018.

[46]

Sreyas Mohan, Zahra Kadkhodaie, Eero P. Simoncelli, and Carlos Fernandez-Granda. Robust and interpretable blind image denoising via bias-free convolutional neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HJlSmC4FPS.

[47]

J Tachella, D Chen, S Hurault, M Terris, and A Wang. Deepinverse: A deep learning framework for inverse problems in imaging. URL: https://deepinv. github. io/deepinv, 2023.

[48]

Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, Rakesh Ranjan, Radu Timofte, and Luc Van Gool. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1775–1787, 2023.

[49]

Samuel G Armato III, Geoffrey McLennan, Luc Bidaut, Michael F McNitt-Gray, Charles R Meyer, Anthony P Reeves, Binsheng Zhao, Denise R Aberle, Claudia I Henschke, Eric A Hoffman, et al. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics, 38 (2): 915–931, 2011.

[50]

Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging, 3 (1): 47–57, March 2017. ISSN 2333-9403. .

[51]

Allard A. Hendriksen, Daniel M. Pelt, and K. Joost Batenburg. : Self-supervised deep convolutional denoising for tomography. IEEE Transactions on Computational Imaging, 6: 1320–1335, 2020. ISSN 2333-9403, 2334-0118, 2573-0436. .

[52]

Tao Huang, Songjiang Li, Xu Jia, Huchuan Lu, and Jianzhuang Liu. : Self-Supervised Denoising from Single Noisy Images. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14776–14785, Nashville, TN, USA, June 2021. IEEE. ISBN 978-1-66544-509-2. .

[53]

Julián Tachella, Dongdong Chen, and Mike Davies. Unsupervised Learning From Incomplete Measurements for Inverse Problems. Advances in Neural Information Processing Systems, 35: 4983–4995, December 2022.

[54]

Julián Tachella, Mike Davies, and Laurent Jacques. : Self-supervised learning with Unknown Noise level and Stein’s Unbiased Risk Estimate. In The Thirteenth International Conference on Learning Representations, October 2024.

[55]

Charles M. Stein. Estimation of the Mean of a Multivariate Normal Distribution. 9 (6): 1135–1151, 1981. ISSN 0090-5364. URL https://www.jstor.org/stable/2240405.

[56]

Hemant K. Aggarwal, Merry P. Mani, and Mathews Jacob. : Model-Based Deep Learning Architecture for Inverse Problems. IEEE Transactions on Medical Imaging, 38 (2): 394–405, February 2019. ISSN 0278-0062, 1558-254X. .

[57]

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022.

[58]

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35: 23593–23606, 2022.

[59]

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34: 8780–8794, 2021.

[60]

Tristan Bepler, Kotaro Kelley, Alex J. Noble, and Bonnie Berger. Topaz-Denoise: general deep denoising models for cryoEM and cryoET. Nature Communications, 11 (1): 5208, October 2020. ISSN 2041-1723. . URL https://www.nature.com/articles/s41467-020-18952-1.

[61]

David B. Lindell, Matthew O’Toole, and Gordon Wetzstein. . ACM Trans. Graph. (SIGGRAPH), (4), 2018.

[62]

Julián Tachella and Marcelo Pereyra. Equivariant bootstrapping for uncertainty quantification in imaging inverse problems. In 27th International Conference on Artificial Intelligence and Statistics 2024, pp. 4141–4149, 2024.

[63]

Mauricio Delbracio and Peyman Milanfar. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=VmyFF5lL3F. Featured Certification.

[64]

Guy Ohayon, Tomer Michaeli, and Michael Elad. Posterior-mean rectified flow: Towards minimum MSE photo-realistic image restoration. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=hPOt3yUXii.

[65]

Christian J Schuler, Michael Hirsch, Stefan Harmeling, and Bernhard Schölkopf. Learning to deblur. IEEE transactions on pattern analysis and machine intelligence, 38 (7): 1439–1451, 2015.

[66]

Sathish Ramani, Thierry Blu, and Michael Unser. Monte-Carlo Sure: A Black-Box Optimization of Regularization Parameters for General Denoising Algorithms. IEEE Transactions on Image Processing, 17 (9): 1540–1554, September 2008. ISSN 1057-7149. .

[67]

Brayan Monroy, Jorge Bacca, and Julián Tachella. Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise. In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, June 2025. IEEE.

[68]

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uǧurbil, and Mehmet Akçakaya. Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 921–925, April 2020. .

[69]

Bradley Efron. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics: Methodology and distribution, pp. 569–593. Springer, 1992.

[70]

David YW Thong, Charlesquin Kemajou Mbakam, and Marcelo Pereyra. Do bayesian imaging methods report trustworthy probabilities? arXiv preprint arXiv:2405.08179, 2024.

We use the convention that for \(\gamma=0\) the noise model becomes purely Gaussian.↩︎
We do not explicit the dependency of \(\operatorname{R}_{\boldsymbol{\theta}}\) on \((\sigma, \gamma)\).↩︎

Reconstruct Anything Model: a lightweight foundation model for computational imaging