Shape-for-Motion:基于3D代理的精准一致视频编辑
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
June 27, 2025
作者: Yuhao Liu, Tengfei Wang, Fang Liu, Zhenwei Wang, Rynson W. H. Lau
cs.AI
摘要
深度学习生成模型的最新进展为视频合成开辟了前所未有的机遇。然而,在实际应用中,用户往往寻求能够精确且一致地实现其创意编辑意图的工具。尽管现有方法已取得显著进展,但确保与用户意图的细粒度对齐仍是一个开放且具有挑战性的问题。在本研究中,我们提出了Shape-for-Motion,一个创新框架,它通过引入三维代理来实现精确且一致的视频编辑。Shape-for-Motion通过将输入视频中的目标对象转换为时间一致的三维网格(即三维代理),使得编辑操作可以直接在代理上进行,随后再推断回视频帧中。为了简化编辑流程,我们设计了一种新颖的双重传播策略,允许用户仅需对单帧的三维网格进行编辑,编辑内容便会自动传播至其他帧的三维网格中。不同帧的三维网格进一步投影至二维空间,生成编辑后的几何与纹理渲染图,这些作为解耦视频扩散模型的输入,用于生成编辑结果。我们的框架支持跨视频帧的各种精确且物理一致的操作,包括姿态编辑、旋转、缩放、平移、纹理修改及对象合成。本方法标志着向高质量、可控视频编辑工作流迈出的关键一步。大量实验验证了我们方法的优越性和有效性。项目页面:https://shapeformotion.github.io/
English
Recent advances in deep generative modeling have unlocked unprecedented
opportunities for video synthesis. In real-world applications, however, users
often seek tools to faithfully realize their creative editing intentions with
precise and consistent control. Despite the progress achieved by existing
methods, ensuring fine-grained alignment with user intentions remains an open
and challenging problem. In this work, we present Shape-for-Motion, a novel
framework that incorporates a 3D proxy for precise and consistent video
editing. Shape-for-Motion achieves this by converting the target object in the
input video to a time-consistent mesh, i.e., a 3D proxy, allowing edits to be
performed directly on the proxy and then inferred back to the video frames. To
simplify the editing process, we design a novel Dual-Propagation Strategy that
allows users to perform edits on the 3D mesh of a single frame, and the edits
are then automatically propagated to the 3D meshes of the other frames. The 3D
meshes for different frames are further projected onto the 2D space to produce
the edited geometry and texture renderings, which serve as inputs to a
decoupled video diffusion model for generating edited results. Our framework
supports various precise and physically-consistent manipulations across the
video frames, including pose editing, rotation, scaling, translation, texture
modification, and object composition. Our approach marks a key step toward
high-quality, controllable video editing workflows. Extensive experiments
demonstrate the superiority and effectiveness of our approach. Project page:
https://shapeformotion.github.io/