3D-Fixup:運用三維先驗知識推進照片編輯技術
3D-Fixup: Advancing Photo Editing with 3D Priors
May 15, 2025
作者: Yen-Chi Cheng, Krishna Kumar Singh, Jae Shin Yoon, Alex Schwing, Liangyan Gui, Matheus Gadelha, Paul Guerrero, Nanxuan Zhao
cs.AI
摘要
尽管通过扩散模型在图像先验建模方面取得了显著进展,三维感知的图像编辑仍然面临挑战,部分原因在于对象仅通过单一图像进行指定。为应对这一挑战,我们提出了3D-Fixup,一个基于学习到的三维先验指导二维图像编辑的新框架。该框架支持诸如物体平移和三维旋转等复杂编辑场景。为实现这一目标,我们采用了一种基于训练的方法,充分利用扩散模型的生成能力。鉴于视频数据自然编码了现实世界的物理动态,我们转向视频数据以生成训练数据对,即源帧与目标帧。我们不仅依赖单一训练模型来推断源帧与目标帧之间的变换,还引入了来自图像到三维模型的三维指导,通过将二维信息显式投影至三维空间,架起了这一挑战性任务的桥梁。我们设计了一套数据生成流程,以确保在整个训练过程中提供高质量的三维指导。结果表明,通过整合这些三维先验,3D-Fixup有效支持了复杂且身份一致的三维感知编辑,实现了高质量的结果,并推动了扩散模型在真实图像处理中的应用。代码可在https://3dfixup.github.io/获取。
English
Despite significant advances in modeling image priors via diffusion models,
3D-aware image editing remains challenging, in part because the object is only
specified via a single image. To tackle this challenge, we propose 3D-Fixup, a
new framework for editing 2D images guided by learned 3D priors. The framework
supports difficult editing situations such as object translation and 3D
rotation. To achieve this, we leverage a training-based approach that harnesses
the generative power of diffusion models. As video data naturally encodes
real-world physical dynamics, we turn to video data for generating training
data pairs, i.e., a source and a target frame. Rather than relying solely on a
single trained model to infer transformations between source and target frames,
we incorporate 3D guidance from an Image-to-3D model, which bridges this
challenging task by explicitly projecting 2D information into 3D space. We
design a data generation pipeline to ensure high-quality 3D guidance throughout
training. Results show that by integrating these 3D priors, 3D-Fixup
effectively supports complex, identity coherent 3D-aware edits, achieving
high-quality results and advancing the application of diffusion models in
realistic image manipulation. The code is provided at
https://3dfixup.github.io/Summary
AI-Generated Summary