提示：基於語義和修復的文本驅動影像處理指南

摘要

基於文本驅動的擴散模型在各種圖像編輯任務中變得越來越受歡迎，包括修補、風格化和物體替換。然而，將這種語言-視覺範式應用於更細緻級別的圖像處理任務，如去噪、超分辨率、去模糊和壓縮失真去除，仍然是一個開放的研究問題。在本文中，我們開發了TIP，一個以文本驅動的圖像處理框架，利用自然語言作為用戶友好的界面來控制圖像修復過程。我們考慮文本信息在兩個維度上的能力。首先，我們使用與內容相關的提示來增強語義對齊，有效減輕修復結果中的身份模糊。其次，我們的方法是第一個支持通過基於語言的定量規範修復強度的細緻級指導的框架，無需明確的任務特定設計。此外，我們引入了一種新的融合機制，通過學習重新調整生成先驗，從而實現更好的修復保真度，擴展了現有的ControlNet架構。我們的大量實驗證明了TIP相對於現有技術的優越修復性能，同時提供了基於文本的控制靈活性，以控制修復效果。

English

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop TIP, a Text-driven Image Processing framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of text information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of TIP compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.

提示：基於語義和修復的文本驅動影像處理指南

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

摘要

Support