提示：基于文本的图像处理与语义和恢复指南

摘要

基于文本驱动的扩散模型在各种图像编辑任务中变得越来越受欢迎，包括修补、风格化和对象替换。然而，将这种语言-视觉范式应用于更精细级别的图像处理任务，如去噪、超分辨率、去模糊和压缩伪影去除，仍然是一个开放的研究问题。在本文中，我们开发了TIP，一个文本驱动的图像处理框架，利用自然语言作为用户友好的界面来控制图像恢复过程。我们考虑文本信息在两个维度上的作用。首先，我们使用与内容相关的提示来增强语义对齐，有效减轻恢复结果中的身份模糊。其次，我们的方法是第一个支持通过基于语言的定量规范来进行精细级别指导的框架，无需明确的任务特定设计。此外，我们引入了一种新颖的融合机制，通过学习重新调整生成先验来增强现有的ControlNet架构，从而实现更好的恢复保真度。我们广泛的实验证明了TIP相对于现有技术的卓越恢复性能，同时提供了基于文本的控制恢复效果的灵活性。

English

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop TIP, a Text-driven Image Processing framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of text information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of TIP compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.

提示：基于文本的图像处理与语义和恢复指南

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

摘要

Support