ChatPaper.aiChatPaper

BlenderFusion:基于3D场景的视觉编辑与生成式合成

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

June 20, 2025
作者: Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo
cs.AI

摘要

我们推出BlenderFusion,一个生成式视觉合成框架,通过重组物体、相机和背景来合成新场景。它遵循分层-编辑-合成的流程:(i) 将视觉输入分割并转换为可编辑的3D实体(分层),(ii) 在Blender中基于3D控制进行编辑(编辑),(iii) 使用生成式合成器将它们融合成一个连贯的场景(合成)。我们的生成式合成器扩展了预训练的扩散模型,使其能够并行处理原始(源)和编辑后(目标)场景。该模型在视频帧上进行了微调,采用了两项关键训练策略:(i) 源掩码,支持如背景替换等灵活修改;(ii) 模拟物体抖动,便于对物体和相机进行解耦控制。在复杂的组合场景编辑任务中,BlenderFusion显著优于现有方法。
English
We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.
PDF481June 30, 2025