3D-Fixup:运用三维先验技术推动照片编辑革新
3D-Fixup: Advancing Photo Editing with 3D Priors
May 15, 2025
作者: Yen-Chi Cheng, Krishna Kumar Singh, Jae Shin Yoon, Alex Schwing, Liangyan Gui, Matheus Gadelha, Paul Guerrero, Nanxuan Zhao
cs.AI
摘要
尽管通过扩散模型在图像先验建模方面取得了显著进展,3D感知的图像编辑仍然面临挑战,部分原因在于对象仅通过单张图像进行指定。为应对这一挑战,我们提出了3D-Fixup,一个基于学习到的3D先验指导的2D图像编辑新框架。该框架支持诸如物体平移和3D旋转等复杂编辑场景。为此,我们采用了一种基于训练的方法,充分利用扩散模型的生成能力。鉴于视频数据天然编码了现实世界的物理动态,我们转向视频数据以生成训练数据对,即源帧与目标帧。不同于仅依赖单一训练模型来推断源帧与目标帧之间的变换,我们引入了来自图像到3D模型的3D指导,通过将2D信息显式投影至3D空间,有效弥合了这一难题。我们设计了一套数据生成流程,确保在整个训练过程中提供高质量的3D指导。实验结果表明,通过整合这些3D先验,3D-Fixup能够有效支持复杂且保持身份一致性的3D感知编辑,实现了高质量的结果,并推动了扩散模型在真实图像处理中的应用。代码可在https://3dfixup.github.io/获取。
English
Despite significant advances in modeling image priors via diffusion models,
3D-aware image editing remains challenging, in part because the object is only
specified via a single image. To tackle this challenge, we propose 3D-Fixup, a
new framework for editing 2D images guided by learned 3D priors. The framework
supports difficult editing situations such as object translation and 3D
rotation. To achieve this, we leverage a training-based approach that harnesses
the generative power of diffusion models. As video data naturally encodes
real-world physical dynamics, we turn to video data for generating training
data pairs, i.e., a source and a target frame. Rather than relying solely on a
single trained model to infer transformations between source and target frames,
we incorporate 3D guidance from an Image-to-3D model, which bridges this
challenging task by explicitly projecting 2D information into 3D space. We
design a data generation pipeline to ensure high-quality 3D guidance throughout
training. Results show that by integrating these 3D priors, 3D-Fixup
effectively supports complex, identity coherent 3D-aware edits, achieving
high-quality results and advancing the application of diffusion models in
realistic image manipulation. The code is provided at
https://3dfixup.github.io/Summary
AI-Generated Summary