PixelHacker:基於結構與語義一致性的圖像修復
PixelHacker: Image Inpainting with Structural and Semantic Consistency
April 29, 2025
作者: Ziyang Xu, Kangsheng Duan, Xiaolei Shen, Zhifeng Ding, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
cs.AI
摘要
圖像修復是介於圖像編輯與圖像生成之間的一個基礎研究領域。近年來,最先進(SOTA)的方法探索了新型注意力機制、輕量級架構以及上下文感知建模,展現了令人印象深刻的性能。然而,這些方法在處理複雜結構(如紋理、形狀、空間關係)和語義(如色彩一致性、物體修復及邏輯正確性)時往往力不從心,導致生成結果出現瑕疵和不恰當之處。為應對這一挑戰,我們設計了一種簡單而有效的修復範式,稱為潛在類別指導,並進一步提出了一種基於擴散模型的PixelHacker。具體而言,我們首先通過標註前景與背景(分別包含116和21個潛在類別)構建了一個包含1400萬張圖像-掩碼對的大型數據集。隨後,我們分別通過兩個固定大小的嵌入編碼潛在的前景與背景表示,並通過線性注意力間歇性地將這些特徵注入去噪過程。最後,通過在我們的數據集上進行預訓練並在開源基準上進行微調,我們獲得了PixelHacker。大量實驗表明,PixelHacker在多個數據集(Places2、CelebA-HQ和FFHQ)上全面超越了SOTA,並在結構與語義上展現出顯著的一致性。項目頁面請訪問https://hustvl.github.io/PixelHacker。
English
Image inpainting is a fundamental research area between image editing and
image generation. Recent state-of-the-art (SOTA) methods have explored novel
attention mechanisms, lightweight architectures, and context-aware modeling,
demonstrating impressive performance. However, they often struggle with complex
structure (e.g., texture, shape, spatial relations) and semantics (e.g., color
consistency, object restoration, and logical correctness), leading to artifacts
and inappropriate generation. To address this challenge, we design a simple yet
effective inpainting paradigm called latent categories guidance, and further
propose a diffusion-based model named PixelHacker. Specifically, we first
construct a large dataset containing 14 million image-mask pairs by annotating
foreground and background (potential 116 and 21 categories, respectively).
Then, we encode potential foreground and background representations separately
through two fixed-size embeddings, and intermittently inject these features
into the denoising process via linear attention. Finally, by pre-training on
our dataset and fine-tuning on open-source benchmarks, we obtain PixelHacker.
Extensive experiments show that PixelHacker comprehensively outperforms the
SOTA on a wide range of datasets (Places2, CelebA-HQ, and FFHQ) and exhibits
remarkable consistency in both structure and semantics. Project page at
https://hustvl.github.io/PixelHacker.Summary
AI-Generated Summary