3D-Fixup：运用三维先验技术推动照片编辑革新

摘要

尽管通过扩散模型在图像先验建模方面取得了显著进展，3D感知的图像编辑仍然面临挑战，部分原因在于对象仅通过单张图像进行指定。为应对这一挑战，我们提出了3D-Fixup，一个基于学习到的3D先验指导的2D图像编辑新框架。该框架支持诸如物体平移和3D旋转等复杂编辑场景。为此，我们采用了一种基于训练的方法，充分利用扩散模型的生成能力。鉴于视频数据天然编码了现实世界的物理动态，我们转向视频数据以生成训练数据对，即源帧与目标帧。不同于仅依赖单一训练模型来推断源帧与目标帧之间的变换，我们引入了来自图像到3D模型的3D指导，通过将2D信息显式投影至3D空间，有效弥合了这一难题。我们设计了一套数据生成流程，确保在整个训练过程中提供高质量的3D指导。实验结果表明，通过整合这些3D先验，3D-Fixup能够有效支持复杂且保持身份一致性的3D感知编辑，实现了高质量的结果，并推动了扩散模型在真实图像处理中的应用。代码可在https://3dfixup.github.io/获取。

English

Despite significant advances in modeling image priors via diffusion models, 3D-aware image editing remains challenging, in part because the object is only specified via a single image. To tackle this challenge, we propose 3D-Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To achieve this, we leverage a training-based approach that harnesses the generative power of diffusion models. As video data naturally encodes real-world physical dynamics, we turn to video data for generating training data pairs, i.e., a source and a target frame. Rather than relying solely on a single trained model to infer transformations between source and target frames, we incorporate 3D guidance from an Image-to-3D model, which bridges this challenging task by explicitly projecting 2D information into 3D space. We design a data generation pipeline to ensure high-quality 3D guidance throughout training. Results show that by integrating these 3D priors, 3D-Fixup effectively supports complex, identity coherent 3D-aware edits, achieving high-quality results and advancing the application of diffusion models in realistic image manipulation. The code is provided at https://3dfixup.github.io/

3D-Fixup：运用三维先验技术推动照片编辑革新

3D-Fixup: Advancing Photo Editing with 3D Priors

摘要

Support