PixelHacker:基于结构与语义一致性的图像修复
PixelHacker: Image Inpainting with Structural and Semantic Consistency
April 29, 2025
作者: Ziyang Xu, Kangsheng Duan, Xiaolei Shen, Zhifeng Ding, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
cs.AI
摘要
图像修复是介于图像编辑与图像生成之间的一个基础研究领域。当前最先进(SOTA)的方法探索了新型注意力机制、轻量级架构及上下文感知建模,展现了卓越的性能。然而,这些方法在处理复杂结构(如纹理、形状、空间关系)和语义(如色彩一致性、物体复原、逻辑正确性)时往往力不从心,导致生成结果出现伪影和不合理之处。为应对这一挑战,我们设计了一种简单而有效的修复范式——潜在类别引导,并进一步提出了一种基于扩散的模型,命名为PixelHacker。具体而言,我们首先通过标注前景与背景(分别包含潜在的116类和21类)构建了一个包含1400万张图像-掩码对的大型数据集。随后,我们分别通过两个固定大小的嵌入编码潜在的前景与背景表示,并通过线性注意力在去噪过程中间歇性地注入这些特征。最后,通过在我们的数据集上进行预训练并在开源基准上微调,我们获得了PixelHacker。大量实验表明,PixelHacker在多个数据集(Places2、CelebA-HQ和FFHQ)上全面超越了SOTA,并在结构与语义上均展现出显著的一致性。项目页面位于https://hustvl.github.io/PixelHacker。
English
Image inpainting is a fundamental research area between image editing and
image generation. Recent state-of-the-art (SOTA) methods have explored novel
attention mechanisms, lightweight architectures, and context-aware modeling,
demonstrating impressive performance. However, they often struggle with complex
structure (e.g., texture, shape, spatial relations) and semantics (e.g., color
consistency, object restoration, and logical correctness), leading to artifacts
and inappropriate generation. To address this challenge, we design a simple yet
effective inpainting paradigm called latent categories guidance, and further
propose a diffusion-based model named PixelHacker. Specifically, we first
construct a large dataset containing 14 million image-mask pairs by annotating
foreground and background (potential 116 and 21 categories, respectively).
Then, we encode potential foreground and background representations separately
through two fixed-size embeddings, and intermittently inject these features
into the denoising process via linear attention. Finally, by pre-training on
our dataset and fine-tuning on open-source benchmarks, we obtain PixelHacker.
Extensive experiments show that PixelHacker comprehensively outperforms the
SOTA on a wide range of datasets (Places2, CelebA-HQ, and FFHQ) and exhibits
remarkable consistency in both structure and semantics. Project page at
https://hustvl.github.io/PixelHacker.Summary
AI-Generated Summary