Sketch3DVE：基於草圖的三維場景視頻編輯

摘要

近期的視頻編輯方法在風格轉換或外觀修改方面取得了引人注目的成果。然而，在視頻中編輯三維場景的結構內容仍然具有挑戰性，尤其是在處理顯著的視角變化時，如大幅度的攝像機旋轉或變焦。關鍵挑戰包括生成與原始視頻保持一致的新視角內容、保留未編輯區域，以及將稀疏的二維輸入轉化為逼真的三維視頻輸出。為解決這些問題，我們提出了Sketch3DVE，這是一種基於草圖的三維感知視頻編輯方法，能夠對具有顯著視角變化的視頻進行精細的局部操控。為應對稀疏輸入帶來的挑戰，我們採用圖像編輯方法生成首幀的編輯結果，並將其傳播至視頻的其餘幀。我們利用草圖作為精確幾何控制的交互工具，同時也支持其他基於掩碼的圖像編輯方法。為處理視角變化，我們對視頻中的三維信息進行了詳細分析和操控。具體而言，我們利用密集立體方法估計輸入視頻的點雲和攝像機參數。隨後，我們提出了一種點雲編輯方法，使用深度圖來表示新編輯組件的三維幾何形狀，並將其有效地與原始三維場景對齊。為了無縫地將新編輯內容與原始視頻融合，同時保留未編輯區域的特徵，我們引入了一種三維感知的掩碼傳播策略，並採用視頻擴散模型來生成逼真的編輯視頻。大量實驗證明了Sketch3DVE在視頻編輯中的優越性。主頁與代碼：http://geometrylearning.com/Sketch3DVE/

English

Recent video editing methods achieve attractive results in style transfer or appearance modification. However, editing the structural content of 3D scenes in videos remains challenging, particularly when dealing with significant viewpoint changes, such as large camera rotations or zooms. Key challenges include generating novel view content that remains consistent with the original video, preserving unedited regions, and translating sparse 2D inputs into realistic 3D video outputs. To address these issues, we propose Sketch3DVE, a sketch-based 3D-aware video editing method to enable detailed local manipulation of videos with significant viewpoint changes. To solve the challenge posed by sparse inputs, we employ image editing methods to generate edited results for the first frame, which are then propagated to the remaining frames of the video. We utilize sketching as an interaction tool for precise geometry control, while other mask-based image editing methods are also supported. To handle viewpoint changes, we perform a detailed analysis and manipulation of the 3D information in the video. Specifically, we utilize a dense stereo method to estimate a point cloud and the camera parameters of the input video. We then propose a point cloud editing approach that uses depth maps to represent the 3D geometry of newly edited components, aligning them effectively with the original 3D scene. To seamlessly merge the newly edited content with the original video while preserving the features of unedited regions, we introduce a 3D-aware mask propagation strategy and employ a video diffusion model to produce realistic edited videos. Extensive experiments demonstrate the superiority of Sketch3DVE in video editing. Homepage and code: http://http://geometrylearning.com/Sketch3DVE/