ChatPaper.aiChatPaper

BlenderFusion:基於三維視覺的編輯與生成式合成技術

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

June 20, 2025
作者: Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo
cs.AI

摘要

我們提出BlenderFusion,這是一個生成式視覺合成框架,通過重組物體、攝像機和背景來合成新場景。它遵循分層-編輯-合成的流程:(i)將視覺輸入分割並轉換為可編輯的三維實體(分層),(ii)在Blender中進行基於三維的編輯(編輯),(iii)使用生成式合成器將它們融合成一個連貫的場景(合成)。我們的生成式合成器擴展了預訓練的擴散模型,以並行處理原始(源)和編輯後(目標)場景。它通過兩種關鍵訓練策略在視頻幀上進行微調:(i)源遮罩,實現如背景替換等靈活修改;(ii)模擬物體抖動,促進對物體和攝像機的解耦控制。BlenderFusion在複雜的組合場景編輯任務中顯著優於先前的方法。
English
We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.
PDF481June 30, 2025