Skip to content

ACL: Activating Capability of Linear Attention for Image Restoration

CVPR2025Mamba

ACL: 激活线性注意力的图像修复

Abstract

Image restoration (IR), a cornerstone of computer vision, has embarked on a new epoch with the advent of deep learning technologies. Recently, numerous CNN and Transformer-based methods have been developed, yet they frequently encounter limitations in global receptive fields and computational efficiency. To mitigate these challenges, recent studies have employed the Selective Space State Model (Mamba), which embodies both attributes. However, due to Mamba's inherent one-dimensional scanning limitations, some approaches have introduced multi-directional scanning to bolster inter-sequence correlations. Despite these enhancements, these methods still struggle with managing local pixel correlations across various directions. Moreover, the recursive computation in Mamba's SSM leads to reduced efficiency. To resolve these issues, we exploit the mathematical congruences between linear attention and SSM within the Mamba to propose a novel model based on a new design structure, ACL. This model integrates linear attention blocks instead of SSM within the Mamba, serving as the core component of encoders/decoders, and aims to preserve a global perspective while boosting computational efficiency. Furthermore, we have designed a simple yet robust local enhancement module with multi-scale dilated convolutions to extract coarse and fine features to improve local detail recovery. Experimental results confirm that our ACL model excels in classical IR tasks such as de-blurring and de-raining, while maintaining relatively low parameter counts and FLOPs.

图像修复(IR)作为计算机视觉的基石,随着深度学习技术的兴起开启了新纪元。近年来,尽管涌现了大量基于 CNN 和 Transformer 的方法,但这些方法仍常受限于全局感受野与计算效率。为应对这些挑战,近期研究采用了兼具两种特性的选择性状态空间模型(Mamba)。然而,由于 Mamba 固有的单向扫描限制,部分方法通过引入多方向扫描来增强序列间关联性。尽管有所改进,这些方法在处理不同方向的局部像素关联性时仍显不足。此外,Mamba 中 SSM 的递归计算机制导致效率降低。为解决这些问题,我们利用 Mamba 中线性注意力与 SSM 的数学一致性,提出基于新型架构 ACL 的模型。该模型以线性注意力模块替代 Mamba 中的 SSM 作为编码器/解码器的核心组件,在保持全局视野的同时提升计算效率。此外,我们设计了包含多尺度空洞卷积的轻量高效局部增强模块,通过提取粗粒度与细粒度特征来改善局部细节恢复。实验结果表明,我们的 ACL 模型在去模糊、去雨等经典图像修复任务中表现优异,同时保持较低的参数量与 FLOPs。


Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Zamfir E, Wu Z, Mehta N, et al. Complexity Experts are Task-Discriminative Learners for Any Image Restoration[J]. arXiv preprint arXiv:2411.18466, 2024.

https://github.com/eduardzamfir/MoCE-IR

CVPR2025MOE

复杂度专家是任意图像修复的任务判别式学习器

Abstract

Recent advancements in all-in-one image restoration models have revolutionized the ability to address diverse degradations through a unified framework. However, parameters tied to specific tasks often remain inactive for other tasks, making mixture-of-experts (MoE) architectures a natural extension. Despite this, MoEs often show inconsistent behavior, with some experts unexpectedly generalizing across tasks while others struggle within their intended scope. This hinders leveraging MoEs' computational benefits by bypassing irrelevant experts during inference. We attribute this undesired behavior to the uniform and rigid architecture of traditional MoEs. To address this, we introduce "complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields. A key challenge is assigning tasks to each expert, as degradation complexity is unknown in advance. Thus, we execute tasks with a simple bias toward lower complexity. To our surprise, this preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity. Extensive experiments validate our approach, demonstrating the ability to bypass irrelevant experts during inference while maintaining superior performance. The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.

近期的一体化图像修复模型通过统一框架处理多种退化问题,取得了显著进展。然而,与特定任务绑定的参数常对其他任务保持非活跃状态,这使得混合专家(MoE)架构成为自然扩展方向。尽管如此,MoE 常表现出不一致性:部分专家意外地跨任务泛化,而其他专家在其预定范围内表现不佳,这阻碍了通过推理阶段绕过无关专家以发挥 MoE 计算优势的潜力。我们将此异常归因于传统 MoE 架构的统一性与僵化性,并提出“复杂度专家”——一种具有可变计算复杂度与感受野的灵活专家模块。核心挑战在于如何为各专家分配任务,因退化复杂度通常未知。为此,我们令任务执行过程天然偏向低复杂度选择。令人惊讶的是,这种偏好有效驱动了任务特异性分配机制,将任务精准分配至具备合适复杂度的专家。大量实验验证了该方法的有效性:在推理阶段成功绕过无关专家的同时保持性能优势。所提 MoCE-IR 模型超越现有先进方法,证实其高效性与实际适用性。


GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration

Rajagopalan S, Nair N G, Paranjape J N, et al. GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration[J]. arXiv preprint arXiv:2411.17687, 2024.

https://github.com/sudraj2002/GenDeg

CVPR2025AIOR

GenDeg: 基于扩散的退化合成实现通用化一体化图像修复

Abstract

Deep learning-based models for All-In-One Image Restoration (AIOR) have achieved significant advancements in recent years. However, their practical applicability is limited by poor generalization to samples outside the training distribution. This limitation arises primarily from insufficient diversity in degradation variations and scenes within existing datasets, resulting in inadequate representations of real-world scenarios. Additionally, capturing large-scale real-world paired data for degradations such as haze, low-light, and raindrops is often cumbersome and sometimes infeasible. In this paper, we leverage the generative capabilities of latent diffusion models to synthesize high-quality degraded images from their clean counterparts. Specifically, we introduce GenDeg, a degradation and intensity-aware conditional diffusion model capable of producing diverse degradation patterns on clean images. Using GenDeg, we synthesize over 550k samples across six degradation types: haze, rain, snow, motion blur, low-light, and raindrops. These generated samples are integrated with existing datasets to form the GenDS dataset, comprising over 750k samples. Our experiments reveal that image restoration models trained on the GenDS dataset exhibit significant improvements in out-of-distribution performance compared to those trained solely on existing datasets. Furthermore, we provide comprehensive analyses on the implications of diffusion model-based synthetic degradations for AIOR. The code will be made publicly available.

近年来,基于深度学习的一体化图像修复(AIOR)模型取得了显著进展。然而,由于对训练分布外样本的泛化能力不足,其实际应用受到限制。这种局限主要源于现有数据集中退化变化和场景多样性的不足,导致无法充分表征真实世界场景。此外,针对雾霾、低光和雨滴等退化类型的大规模真实配对数据采集通常较为繁琐,有时甚至不可行。本文利用潜在扩散模型的生成能力,从干净图像合成高质量退化图像。具体而言,我们提出了 GenDeg——一种退化类型和强度感知的条件扩散模型,能够在干净图像上生成多样化的退化模式。利用 GenDeg,我们在六种退化类型(雾霾、雨、雪、运动模糊、低光和雨滴)上合成了超过 55 万样本。这些生成样本与现有数据集整合形成 GenDS 数据集,共包含超过 75 万样本。实验表明,在 GenDS 数据集上训练的图像修复模型相较于仅在现有数据集上训练的模型,在分布外性能上展现出显著提升。此外,我们针对基于扩散模型的合成退化对 AIOR 的影响提供了全面分析。代码将公开可用。


MaIR: A Locality-and Continuity-Preserving Mamba for Image Restoration

Li B, Zhao H, Wang W, et al. MaIR: A Locality-and Continuity-Preserving Mamba for Image Restoration[J]. arXiv preprint arXiv:2412.20066, 2024.

https://github.com/XLearning-SCU/2025-CVPR-MaIR

CVPR2025Mamba

MaIR:一种保持局部性与连续性的 Mamba 模型用于图像修复

Abstract

Recent advancements in Mamba have shown promising results in image restoration. These methods typically flatten 2D images into multiple distinct 1D sequences along rows and columns, process each sequence independently using selective scan operation, and recombine them to form the outputs. However, such a paradigm overlooks two vital aspects: i) the local relationships and spatial continuity inherent in natural images, and ii) the discrepancies among sequences unfolded through totally different ways. To overcome the drawbacks, we explore two problems in Mamba-based restoration methods: i) how to design a scanning strategy preserving both locality and continuity while facilitating restoration, and ii) how to aggregate the distinct sequences unfolded in totally different ways. To address these problems, we propose a novel Mamba-based Image Restoration model (MaIR), which consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA). Specifically, NSS preserves locality and continuity of the input images through the stripe-based scanning region and the S-shaped scanning path, respectively. SSA aggregates sequences through calculating attention weights within the corresponding channels of different sequences. Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets, achieving state-of-the-art performance on the tasks of image super-resolution, denoising, deblurring and dehazing. Our codes will be available after acceptance.

最近的 Mamba 模型进展在图像修复任务中展现了有前景的结果。这些方法通常将二维图像沿行和列展开为多个独立的一维序列,使用选择性扫描操作独立处理每个序列,然后重组形成输出。然而,这种范式忽视了两个重要方面:i) 自然图像固有的局部关联性与空间连续性,ii) 通过完全不同方式展开的序列间差异。为克服这些缺陷,我们探索了基于 Mamba 的修复方法中的两个核心问题:i) 如何设计既能保持局部性与连续性又能促进修复的扫描策略,ii) 如何聚合通过完全不同方式展开的独立序列。针对这些问题,我们提出了一种新型 Mamba 图像修复模型(MaIR),包含嵌套 S 形扫描策略(NSS)和序列混洗注意力模块(SSA)。具体而言,NSS 通过基于条纹的扫描区域保持输入图像的局部性,通过 S 形扫描路径保持连续性。SSA 通过计算不同序列对应通道间的注意力权重来聚合序列。得益于 NSS 和 SSA,MaIR 在 14 个挑战性数据集上超越了 40 个基线模型,在图像超分辨率、去噪、去模糊和去雾任务中实现了最先进的性能。


MambaIRv2: Attentive State Space Restoration

Guo H, Guo Y, Zha Y, et al. MambaIRv2: Attentive State Space Restoration[J]. arXiv preprint arXiv:2411.15269, 2024.

https://github.com/csguoh/MambaIR

CVPR2025Mamba

MambaIRv2:注意力状态空间修复

Abstract

The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by even 0.35dB PSNR for lightweight SR even with 9.3% less parameters and suppresses HAT on classic SR by up to 0.29dB.

基于 Mamba 的图像修复主干网络近期展现出平衡全局感知与计算效率的显著潜力。然而,Mamba 固有的因果建模限制(即每个像素仅依赖于扫描序列中的前驱像素)制约了图像中所有像素的充分利用,从而为图像修复带来新挑战。本工作提出 MambaIRv2,通过赋予 Mamba 类似视觉 Transformer 的非因果建模能力,构建注意力状态空间修复模型。具体而言,提出的注意力状态空间方程允许模型关注扫描序列之外的区域,仅需单次扫描即可实现图像展开。此外,我们进一步引入语义引导的邻近机制,促进远距离但相似像素之间的交互。大量实验表明,我们的 MambaIRv2 在轻量级超分任务中以 9.3%更少的参数量超越 SRFormer 达 0.35dB PSNR,并在经典超分任务中最高超越 HAT 达 0.29dB。


OSDFace: One-Step Diffusion Model for Face Restoration

Wang J, Gong J, Zhang L, et al. OSDFace: One-Step Diffusion Model for Face Restoration[J]. arXiv preprint arXiv:2411.17163, 2024.

https://github.com/jkwang28/OSDFace

CVPR2025Diffusion

OSDFace:用于面部修复的一步扩散模型

Abstract

Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency.

扩散模型在面部修复任务中已展现出卓越的性能。然而,其多步推理过程仍具有较高的计算复杂度,限制了实际应用场景的适用性。此外,现有方法通常难以生成与目标身份一致、自然和谐且真实的面部图像。本研究提出 OSDFace,一种新颖的一步式扩散模型用于面部修复。具体而言,我们设计了视觉表示嵌入器(VRE)以更好地捕获先验信息并理解输入面部。在 VRE 中,低质量面部通过视觉标记器处理,随后通过向量量化字典进行嵌入以生成视觉提示。此外,我们引入了源自人脸识别的面部身份损失函数以进一步确保身份一致性。我们进一步采用生成对抗网络(GAN)作为引导模型,促进修复后面部与真实样本间的分布对齐。实验结果表明,OSDFace 在视觉质量和定量指标上均超越当前最先进(SOTA)方法,能够生成具有高身份一致性的高保真自然面部图像。


Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual

Wang C, Guo L, Fu Z, et al. Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual[J]. arXiv preprint arXiv:2503.01288, 2025.

https://github.com/ChongWang1024/RDMD

CVPR2025Diffusion

使用双重扩散模型协调随机与确定性策略实现零样本图像修复

Abstract

Plug-and-play (PnP) methods offer an iterative strategy for solving image restoration (IR) problems in a zero-shot manner, using a learned discriminative denoiser as the implicit prior. More recently, a sampling-based variant of this approach, which utilizes a pre-trained generative diffusion model, has gained great popularity for solving IR problems through stochastic sampling. The IR results using PnP with a pre-trained diffusion model demonstrate distinct advantages compared to those using discriminative denoisers, i.e.,improved perceptual quality while sacrificing the data fidelity. The unsatisfactory results are due to the lack of integration of these strategies in the IR tasks. In this work, we propose a novel zero-shot IR scheme, dubbed Reconciling Diffusion Model in Dual (RDMD), which leverages only a single pre-trained diffusion model to construct two complementary regularizers. Specifically, the diffusion model in RDMD will iteratively perform deterministic denoising and stochastic sampling, aiming to achieve high-fidelity image restoration with appealing perceptual quality. RDMD also allows users to customize the distortion-perception tradeoff with a single hyperparameter, enhancing the adaptability of the restoration process in different practical scenarios. Extensive experiments on several IR tasks demonstrate that our proposed method could achieve superior results compared to existing approaches on both the FFHQ and ImageNet datasets.

即插即用(PnP)方法通过使用学习到的判别式去噪器作为隐式先验,为零样本图像修复(IR)问题提供了一种迭代求解策略。最近,基于采样的变体方法(利用预训练的生成式扩散模型)通过随机采样解决 IR 问题,获得了广泛关注。与使用判别式去噪器的 PnP 方法相比,基于预训练扩散模型的 IR 结果展现出显著优势(即提升感知质量但牺牲数据保真度)。这些不理想的结果源于 IR 任务中缺乏对这些策略的整合。本文提出了一种新颖的零样本 IR 方案,称为双重扩散模型协调(RDMD),其仅利用单个预训练扩散模型构建两个互补的正则化器。具体而言,RDMD 中的扩散模型将迭代执行确定性去噪与随机采样,旨在实现高保真图像修复并提升感知质量。RDMD 还允许用户通过单一超参数自定义失真-感知权衡,增强修复过程在不同实际场景中的适应性。在多个 IR 任务上的大量实验表明,与现有方法相比,本文提出的方法在 FFHQ 和 ImageNet 数据集上均能取得更优结果。


Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Garber T, Tirer T. Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)[J]. arXiv preprint arXiv:2412.20596, 2024.

CVPR2025

零样本图像修复:利用一致性模型的少步数引导(及拓展)

Abstract

In recent years, it has become popular to tackle image restoration tasks with a single pretrained diffusion model (DM) and data-fidelity guidance, instead of training a dedicated deep neural network per task. However, such "zero-shot" restoration schemes currently require many Neural Function Evaluations (NFEs) for performing well, which may be attributed to the many NFEs needed in the original generative functionality of the DMs. Recently, faster variants of DMs have been explored for image generation. These include Consistency Models (CMs), which can generate samples via a couple of NFEs. However, existing works that use guided CMs for restoration still require tens of NFEs or fine-tuning of the model per task that leads to performance drop if the assumptions during the fine-tuning are not accurate. In this paper, we propose a zero-shot restoration scheme that uses CMs and operates well with as little as 4 NFEs. It is based on a wise combination of several ingredients: better initialization, back-projection guidance, and above all a novel noise injection mechanism. We demonstrate the advantages of our approach for image super-resolution, deblurring and inpainting. Interestingly, we show that the usefulness of our noise injection technique goes beyond CMs: it can also mitigate the performance degradation of existing guided DM methods when reducing their NFE count.

近年来,流行使用单一预训练扩散模型(DM)和数据保真度引导来处理图像修复任务,而非针对每个任务训练专用深度神经网络。然而,这种"零样本"修复方案目前需要大量神经函数评估(NFEs)才能表现良好,这可能归因于扩散模型原始生成功能本身需要多次 NFE。最近,研究者探索了更快的扩散模型变体用于图像生成,包括可以通过数次 NFE 生成样本的一致性模型(CMs)。然而,现有使用引导式 CMs 进行修复的方法仍需要数十次 NFE 或针对每个任务进行模型微调,若微调阶段的假设不准确会导致性能下降。本文提出了一种基于 CMs 的零样本修复方案,仅需 4 次 NFE 即可良好运行。该方案基于多种要素的巧妙结合:更好的初始化、反向投影引导,以及最重要的新型噪声注入机制。我们在图像超分辨率、去模糊和修复任务中展示了本方法的优势。有趣的是,我们发现噪声注入技术的有效性不仅限于 CMs:当减少现有引导式 DM 方法的 NFE 次数时,该技术还能缓解其性能下降问题。


Multi-axis Prompt and Multi-dimension Fusion Network for All-in-one Weather-degraded Image Restoration

https://github.com/chdwyb/MPMF-Net

AAAI2025AIOR

基于多轴提示与多维度融合网络的一体化天气退化图像修复

Abstract

Existing approaches aiming to remove adverse weather degradations compromise the image quality and incur the long processing time. To this end, we introduce a multi-axis prompt and multi-dimension fusion network (MPMF-Net). Specifically, we develop a multi-axis prompts learning block (MPLB), which learns the prompts along three separate axis planes, requiring fewer parameters and achieving superior performance. Moreover, we present a multi-dimension feature interaction block (MFIB), which optimizes intra-scale feature fusion by segregating features along height, width and channel dimensions. This strategy enables more accurate mutual attention and adaptive weight determination. Additionally, we propose the coarse-scale degradation-free implicit neural representations (CDINR) to normalize the degradation levels of different weather conditions. Extensive experiments demonstrate the significant improvements of our model over the recent well-performing approaches in both reconstruction fidelity and inference time.

现有方法在消除恶劣天气退化时会影响图像质量并导致处理时间过长。为此,我们提出了一种多轴提示与多维度融合网络(MPMF-Net)。具体而言,我们设计了多轴提示学习块(MPLB),该模块通过沿三个独立的轴平面学习提示信息,在减少参数量的同时实现了更优的性能。此外,我们提出了多维度特征交互块(MFIB),通过将特征沿高度、宽度和通道维度分离,优化了尺度内的特征融合。这一策略可实现更精确的相互注意力和自适应权重确定。我们还提出了粗尺度无退化隐式神经表征(CDINR),用于统一不同天气条件下的退化程度。大量实验表明,相较于近期表现优异的方法,我们的模型在重建保真度和推理时间上均有显著提升。


Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks

https://github.com/xianggkl/VLU-Net

CVPR2025AIOR

视觉-语言梯度下降驱动的一体化深度展开网络

Abstract

Dynamic image degradations, including noise, blur and lighting inconsistencies, pose significant challenges in image restoration, often due to sensor limitations or adverse environmental conditions. Existing Deep Unfolding Networks (DUNs) offer stable restoration performance but require manual selection of degradation matrices for each degradation type, limiting their adaptability across diverse scenarios. To address this issue, we propose the Vision-Language-guided Unfolding Network (VLU-Net), a unified DUN framework for handling multiple degradation types simultaneously. VLU-Net leverages a Vision-Language Model (VLM) refined on degraded image-text pairs to align image features with degradation descriptions, selecting the appropriate transform for target degradation. By integrating an automatic VLM-based gradient estimation strategy into the Proximal Gradient Descent (PGD) algorithm, VLU-Net effectively tackles complex multi-degradation restoration tasks while maintaining interpretability. Furthermore, we design a hierarchical feature unfolding structure to enhance VLU-Net framework, efficiently synthesizing degradation patterns across various levels. VLU-Net is the first all-in-one DUN framework and outperforms current leading one-by-one and all-in-one end-to-end methods by 3.74 dB on the SOTS dehazing dataset and 1.70 dB on the Rain100L deraining dataset.

动态图像退化(包括噪声、模糊和光照不一致)给图像修复带来了重大挑战,这些问题通常源于传感器限制或不利环境条件。现有的深度展开网络(DUNs)虽能提供稳定的修复性能,但需要为每种退化类型手动选择退化矩阵,限制了其在多样化场景中的适应性。为解决这一问题,我们提出视觉-语言引导展开网络(VLU-Net),这是一个能同时处理多种退化类型的统一 DUN 框架。VLU-Net 利用在退化图像-文本对上精炼的视觉-语言模型(VLM),将图像特征与退化描述对齐,从而为目标退化选择适当的变换。通过将基于 VLM 的自动梯度估计策略整合到近端梯度下降(PGD)算法中,VLU-Net 在保持可解释性的同时,有效解决了复杂多退化修复任务。此外,我们设计了分层特征展开结构来增强 VLU-Net 框架,可高效合成不同层次的退化模式。VLU-Net 是首个一体化 DUN 框架,在 SOTS 去雾数据集上以 3.74 dB、在 Rain100L 去雨数据集上以 1.70 dB 的优势超越了当前领先的逐项处理和一体化端到端方法。


Debiased All-in-one Image Restoration with Task Uncertainty Regularization

https://github.com/Aitical/TUR

AAAI2025AIOR

去偏的一体化图像修复:基于任务不确定性的正则化方法

Abstract

All-in-one image restoration is a fundamental low-level vision task with significant real-world applications. The primary challenge lies in addressing diverse degradations within a single model. While current methods primarily exploit task prior information to guide the restoration models, they typically employ uniform multi-task learning, overlooking the heterogeneity in model optimization across different degradation tasks. To eliminate the bias, we propose a task-aware optimization strategy, that introduces adaptive task-specific regularization for multi-task image restoration learning. Specifically, our method dynamically weights and balances losses for different restoration tasks during training, encouraging the implementation of the most reasonable optimization route. In this way, we can achieve more robust and effective model training. Notably, our approach can serve as a plug-and-play strategy to enhance existing models without requiring modifications during inference. Extensive experiments in diverse all-in-one restoration settings demonstrate the superiority and generalization of our approach. For example, AirNet retrained with TUR achieves average improvements of 1.16 dB on three distinct tasks and 1.81 dB on five distinct all-in-one tasks. These results underscore TUR's effectiveness in advancing the SOTAs in all-in-one image restoration, paving the way for more robust and versatile image restoration.

一体化图像修复是一项基础的低层视觉任务,具有重要的现实应用价值。其主要挑战在于通过单一模型处理多种不同的退化类型。现有方法主要利用任务先验信息指导修复模型,但通常采用统一的多任务学习框架,忽视了不同退化任务间模型优化的异质性。为消除这种偏差,我们提出一种任务感知的优化策略,通过引入自适应的任务特定正则化实现多任务图像修复学习。具体而言,该方法在训练过程中动态加权并平衡不同修复任务的损失,从而鼓励模型沿着最合理的优化路径进行迭代。通过这种方式,我们能够实现更鲁棒且高效的模型训练。值得注意的是,该方法可作为即插即用策略直接增强现有模型,且无需在推理阶段进行任何修改。在多种一体化修复场景下的广泛实验验证了该方法的优越性与泛化能力。例如,使用 TUR 重新训练的 AirNet 在三个独立任务上平均提升了 1.16 dB,在五个一体化任务上平均提升了 1.81 dB。这些结果充分证明了 TUR 在推动一体化图像修复领域现有最优方法(SOTAs)方面的有效性,为更鲁棒且通用的图像修复技术提供了新思路。


All-in-One Image Compression and Restoration

https://github.com/ZeldaM1/All-in-one

WACV2025AIOR

一体化图像压缩与修复

Abstract

Visual images corrupted by various types and levels of degradations are commonly encountered in practical image compression. However, most existing image compression methods are tailored for clean images, therefore struggling to achieve satisfying results on these images. Joint compression and restoration methods typically focus on a single type of degradation and fail to address a variety of degradations in practice. To this end, we propose a unified framework for all-in-one image compression and restoration, which incorporates the image restoration capability against various degradations into the process of image compression. The key challenges involve distinguishing authentic image content from degradations, and flexibly eliminating various degradations without prior knowledge. Specifically, the proposed framework approaches these challenges from two perspectives: i.e., content information aggregation, and degradation representation aggregation. Extensive experiments demonstrate the following merits of our model: 1) superior rate-distortion (RD) performance on various degraded inputs while preserving the performance on clean data; 2) strong generalization ability to real-world and unseen scenarios; 3) higher computing efficiency over compared methods.

在图像压缩实践中,视觉图像常因多种类型和程度的退化而受损。然而,现有的大多数图像压缩方法针对干净图像设计,在处理此类退化图像时难以取得理想效果。典型的联合压缩与修复方法通常仅关注单一退化类型,无法应对实际中多样的退化问题。为此,我们提出了一种一体化图像压缩与修复的统一框架,将针对多种退化的图像修复能力整合到压缩过程中。核心挑战在于如何区分真实图像内容与退化痕迹,并在无先验知识的情况下灵活消除各类退化。具体而言,本文框架从两个角度解决这些挑战:内容信息聚合与退化表征聚合。大量实验表明我们的模型具备以下优势:1)在多种退化输入上展现优异的率失真(RD)性能,同时在干净数据上保持原有表现;2)对真实场景和未知情境具有强泛化能力;3)相较于对比方法具备更高的计算效率。


Universal Image Restoration Pre-training via Degradation Classification

https://github.com/MILab-PKU/dcpt

ICLR2025

通过退化分类实现通用图像修复预训练

Abstract

This paper proposes the Degradation Classification Pre-Training (DCPT), which enables models to learn how to classify the degradation type of input images for universal image restoration pre-training. Unlike the existing self-supervised pre-training methods, DCPT utilizes the degradation type of the input image as an extremely weak supervision, which can be effortlessly obtained, even intrinsic in all image restoration datasets. DCPT comprises two primary stages. Initially, image features are extracted from the encoder. Subsequently, a lightweight decoder, such as ResNet18, is leveraged to classify the degradation type of the input image solely based on the features extracted in the first stage, without utilizing the input image. The encoder is pre-trained with a straightforward yet potent DCPT, which is used to address universal image restoration and achieve outstanding performance. Following DCPT, both convolutional neural networks (CNNs) and transformers demonstrate performance improvements, with gains of up to 2.55 dB in the 10D all-in-one restoration task and 6.53 dB in the mixed degradation scenarios. Moreover, previous self-supervised pretraining methods, such as masked image modeling, discard the decoder after pre-training, while our DCPT utilizes the pre-trained parameters more effectively. This superiority arises from the degradation classifier acquired during DCPT, which facilitates transfer learning between models of identical architecture trained on diverse degradation types.

本文提出退化分类预训练(DCPT),使模型能够学习如何对输入图像的退化类型进行分类,从而实现通用图像修复预训练。与现有自监督预训练方法不同,DCPT 利用输入图像的退化类型作为极弱监督信号,这种信号可以轻松获取,甚至内在于所有图像修复数据集中。DCPT 包含两个主要阶段:首先从编码器提取图像特征,随后利用轻量级解码器(如 ResNet18)仅基于第一阶段提取的特征对输入图像的退化类型进行分类,而无需使用输入图像本身。通过这种简单但有效的 DCPT 预训练编码器,可用于解决通用图像修复任务并取得卓越性能。实验表明,经过 DCPT 预训练后,卷积神经网络(CNN)和 Transformer 模型的性能均得到提升,在 10D 全任务修复中增益最高达 2.55dB,在混合退化场景下增益达 6.53dB。此外,先前自监督预训练方法(如掩码图像建模)在预训练后会丢弃解码器,而我们的 DCPT 能更有效地利用预训练参数。这种优势源于 DCPT 过程中获得的退化分类器,其能促进在相同架构模型间进行跨不同退化类型的迁移学习。


Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration

https://github.com/xl-tang3/DA-RCOT

TPAMI'25AIOR

退化感知残差条件最优传输用于一体化图像修复

Abstract

All-in-one image restoration has emerged as a practical and promising low-level vision task for real-world applications. In this context, the key issue lies in how to deal with different types of degraded images simultaneously. In this work, we present a Degradation-Aware Residual-Conditioned Optimal Transport (DA-RCOT) approach that models (all-in-one) image restoration as an optimal transport (OT) problem for unpaired and paired settings, introducing the transport residual as a degradation-specific cue for both the transport cost and the transport map. Specifically, we formalize image restoration with a residual-guided OT objective by exploiting the degradation-specific patterns of the Fourier residual in the transport cost. More crucially, we design the transport map for restoration as a two-pass DA-RCOT map, in which the transport residual is computed in the first pass and then encoded as multi-scale residual embeddings to condition the second-pass restoration. This conditioning process injects intrinsic degradation knowledge (e.g., degradation type and level) and structural information from the multi-scale residual embeddings into the OT map, which thereby can dynamically adjust its behaviors for all-in-one restoration. Extensive experiments across five degradations demonstrate the favorable performance of DA-RCOT as compared to state-of-the-art methods, in terms of distortion measures, perceptual quality, and image structure preservation. Notably, DA-RCOT delivers superior adaptability to real-world scenarios even with multiple degradations and shows distinctive robustness to both degradation levels and the number of degradations.

一体化图像修复已成为现实应用中实用且前景广阔的低层视觉任务。在此背景下,关键问题在于如何同时处理不同类型的退化图像。本工作提出了一种退化感知残差条件最优传输(DA-RCOT)方法,将(一体化)图像修复建模为无配对和配对设置下的最优传输(OT)问题,引入传输残差作为退化特异性线索用于传输成本和传输映射。具体来说,我们通过利用傅里叶残差在传输成本中的退化特异性模式,构建了残差引导的 OT 目标来形式化图像修复。更关键的是,我们将修复的传输映射设计为双通道 DA-RCOT 映射:其中第一通道计算传输残差,随后将其编码为多尺度残差嵌入以条件化第二通道的修复过程。这种条件化机制将内在的退化知识(如退化类型和程度)以及来自多尺度残差嵌入的结构信息注入 OT 映射,从而能够动态调整其行为以实现一体化修复。在五种退化类型上的广泛实验表明,与现有最先进方法相比,DA-RCOT 在失真度量、感知质量和图像结构保持方面均表现出优越性能。值得注意的是,DA-RCOT 即使面对多重退化也展现出对真实场景的优异适应性,同时对退化程度和退化数量均表现出独特的鲁棒性。