ChatPaper.aiChatPaper

形動合一:基於三維代理的精確一致視頻編輯

Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy

June 27, 2025
作者: Yuhao Liu, Tengfei Wang, Fang Liu, Zhenwei Wang, Rynson W. H. Lau
cs.AI

摘要

深度生成建模的最新進展為視頻合成開闢了前所未有的機遇。然而,在實際應用中,用戶往往尋求能夠精確且一致地實現其創意編輯意圖的工具。儘管現有方法已取得進展,但確保與用戶意圖的細粒度對齊仍是一個開放且具挑戰性的問題。在本研究中,我們提出了Shape-for-Motion,這是一個新穎的框架,它通過引入3D代理來實現精確且一致的視頻編輯。Shape-for-Motion通過將輸入視頻中的目標對象轉換為時間一致的網格(即3D代理)來實現這一點,允許直接在代理上進行編輯,然後將編輯結果推斷回視頻幀。為了簡化編輯過程,我們設計了一種新穎的雙重傳播策略,允許用戶在單一幀的3D網格上進行編輯,然後這些編輯會自動傳播到其他幀的3D網格上。不同幀的3D網格進一步投影到2D空間,以生成編輯後的幾何和紋理渲染,這些渲染作為解耦視頻擴散模型的輸入,用於生成編輯結果。我們的框架支持跨視頻幀的各種精確且物理一致的操作,包括姿態編輯、旋轉、縮放、平移、紋理修改和對象合成。我們的方法標誌著向高質量、可控視頻編輯工作流程邁出的關鍵一步。大量實驗證明了我們方法的優越性和有效性。項目頁面:https://shapeformotion.github.io/
English
Recent advances in deep generative modeling have unlocked unprecedented opportunities for video synthesis. In real-world applications, however, users often seek tools to faithfully realize their creative editing intentions with precise and consistent control. Despite the progress achieved by existing methods, ensuring fine-grained alignment with user intentions remains an open and challenging problem. In this work, we present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing. Shape-for-Motion achieves this by converting the target object in the input video to a time-consistent mesh, i.e., a 3D proxy, allowing edits to be performed directly on the proxy and then inferred back to the video frames. To simplify the editing process, we design a novel Dual-Propagation Strategy that allows users to perform edits on the 3D mesh of a single frame, and the edits are then automatically propagated to the 3D meshes of the other frames. The 3D meshes for different frames are further projected onto the 2D space to produce the edited geometry and texture renderings, which serve as inputs to a decoupled video diffusion model for generating edited results. Our framework supports various precise and physically-consistent manipulations across the video frames, including pose editing, rotation, scaling, translation, texture modification, and object composition. Our approach marks a key step toward high-quality, controllable video editing workflows. Extensive experiments demonstrate the superiority and effectiveness of our approach. Project page: https://shapeformotion.github.io/
PDF101June 30, 2025