Sketch3DVE:基於草圖的三維場景視頻編輯
Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing
August 19, 2025
作者: Feng-Lin Liu, Shi-Yang Li, Yan-Pei Cao, Hongbo Fu, Lin Gao
cs.AI
摘要
近期的視頻編輯方法在風格轉換或外觀修改方面取得了引人注目的成果。然而,在視頻中編輯三維場景的結構內容仍然具有挑戰性,尤其是在處理顯著的視角變化時,如大幅度的攝像機旋轉或變焦。關鍵挑戰包括生成與原始視頻保持一致的新視角內容、保留未編輯區域,以及將稀疏的二維輸入轉化為逼真的三維視頻輸出。為解決這些問題,我們提出了Sketch3DVE,這是一種基於草圖的三維感知視頻編輯方法,能夠對具有顯著視角變化的視頻進行精細的局部操控。為應對稀疏輸入帶來的挑戰,我們採用圖像編輯方法生成首幀的編輯結果,並將其傳播至視頻的其餘幀。我們利用草圖作為精確幾何控制的交互工具,同時也支持其他基於掩碼的圖像編輯方法。為處理視角變化,我們對視頻中的三維信息進行了詳細分析和操控。具體而言,我們利用密集立體方法估計輸入視頻的點雲和攝像機參數。隨後,我們提出了一種點雲編輯方法,使用深度圖來表示新編輯組件的三維幾何形狀,並將其有效地與原始三維場景對齊。為了無縫地將新編輯內容與原始視頻融合,同時保留未編輯區域的特徵,我們引入了一種三維感知的掩碼傳播策略,並採用視頻擴散模型來生成逼真的編輯視頻。大量實驗證明了Sketch3DVE在視頻編輯中的優越性。主頁與代碼:http://geometrylearning.com/Sketch3DVE/
English
Recent video editing methods achieve attractive results in style transfer or
appearance modification. However, editing the structural content of 3D scenes
in videos remains challenging, particularly when dealing with significant
viewpoint changes, such as large camera rotations or zooms. Key challenges
include generating novel view content that remains consistent with the original
video, preserving unedited regions, and translating sparse 2D inputs into
realistic 3D video outputs. To address these issues, we propose Sketch3DVE, a
sketch-based 3D-aware video editing method to enable detailed local
manipulation of videos with significant viewpoint changes. To solve the
challenge posed by sparse inputs, we employ image editing methods to generate
edited results for the first frame, which are then propagated to the remaining
frames of the video. We utilize sketching as an interaction tool for precise
geometry control, while other mask-based image editing methods are also
supported. To handle viewpoint changes, we perform a detailed analysis and
manipulation of the 3D information in the video. Specifically, we utilize a
dense stereo method to estimate a point cloud and the camera parameters of the
input video. We then propose a point cloud editing approach that uses depth
maps to represent the 3D geometry of newly edited components, aligning them
effectively with the original 3D scene. To seamlessly merge the newly edited
content with the original video while preserving the features of unedited
regions, we introduce a 3D-aware mask propagation strategy and employ a video
diffusion model to produce realistic edited videos. Extensive experiments
demonstrate the superiority of Sketch3DVE in video editing. Homepage and code:
http://http://geometrylearning.com/Sketch3DVE/