BlenderFusion:基于3D场景的视觉编辑与生成式合成
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
June 20, 2025
作者: Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo
cs.AI
摘要
我们推出BlenderFusion,一个生成式视觉合成框架,通过重组物体、相机和背景来合成新场景。它遵循分层-编辑-合成的流程:(i) 将视觉输入分割并转换为可编辑的3D实体(分层),(ii) 在Blender中基于3D控制进行编辑(编辑),(iii) 使用生成式合成器将它们融合成一个连贯的场景(合成)。我们的生成式合成器扩展了预训练的扩散模型,使其能够并行处理原始(源)和编辑后(目标)场景。该模型在视频帧上进行了微调,采用了两项关键训练策略:(i) 源掩码,支持如背景替换等灵活修改;(ii) 模拟物体抖动,便于对物体和相机进行解耦控制。在复杂的组合场景编辑任务中,BlenderFusion显著优于现有方法。
English
We present BlenderFusion, a generative visual compositing framework that
synthesizes new scenes by recomposing objects, camera, and background. It
follows a layering-editing-compositing pipeline: (i) segmenting and converting
visual inputs into editable 3D entities (layering), (ii) editing them in
Blender with 3D-grounded control (editing), and (iii) fusing them into a
coherent scene using a generative compositor (compositing). Our generative
compositor extends a pre-trained diffusion model to process both the original
(source) and edited (target) scenes in parallel. It is fine-tuned on video
frames with two key training strategies: (i) source masking, enabling flexible
modifications like background replacement; (ii) simulated object jittering,
facilitating disentangled control over objects and camera. BlenderFusion
significantly outperforms prior methods in complex compositional scene editing
tasks.