Sketch3DVE: スケッチベースの3D対応シーンビデオ編集

要旨

最近のビデオ編集手法は、スタイル転送や外観変更において魅力的な結果を達成しています。しかし、ビデオ内の3Dシーンの構造的コンテンツを編集することは依然として困難であり、特に大きなカメラ回転やズームといった大幅な視点変化を扱う場合に顕著です。主な課題には、元のビデオと一貫性のある新規視点コンテンツの生成、未編集領域の維持、疎な2D入力をリアルな3Dビデオ出力に変換することが含まれます。これらの課題に対処するため、我々はSketch3DVEを提案します。これは、大幅な視点変化を伴うビデオの詳細な局所的な操作を可能にするスケッチベースの3D対応ビデオ編集手法です。疎な入力による課題を解決するため、最初のフレームに対して編集結果を生成する画像編集手法を採用し、それをビデオの残りのフレームに伝播させます。正確なジオメトリ制御のためのインタラクションツールとしてスケッチを利用し、他のマスクベースの画像編集手法もサポートします。視点変化を扱うために、ビデオ内の3D情報を詳細に分析し操作します。具体的には、密なステレオ手法を利用して入力ビデオの点群とカメラパラメータを推定します。次に、新たに編集されたコンポーネントの3Dジオメトリを表現するために深度マップを使用する点群編集アプローチを提案し、それらを元の3Dシーンと効果的に整合させます。新たに編集されたコンテンツを元のビデオとシームレスに統合しつつ未編集領域の特徴を維持するために、3D対応のマスク伝播戦略を導入し、リアルな編集ビデオを生成するためにビデオ拡散モデルを採用します。広範な実験により、Sketch3DVEのビデオ編集における優位性が実証されています。ホームページとコード: http://geometrylearning.com/Sketch3DVE/

English

Recent video editing methods achieve attractive results in style transfer or appearance modification. However, editing the structural content of 3D scenes in videos remains challenging, particularly when dealing with significant viewpoint changes, such as large camera rotations or zooms. Key challenges include generating novel view content that remains consistent with the original video, preserving unedited regions, and translating sparse 2D inputs into realistic 3D video outputs. To address these issues, we propose Sketch3DVE, a sketch-based 3D-aware video editing method to enable detailed local manipulation of videos with significant viewpoint changes. To solve the challenge posed by sparse inputs, we employ image editing methods to generate edited results for the first frame, which are then propagated to the remaining frames of the video. We utilize sketching as an interaction tool for precise geometry control, while other mask-based image editing methods are also supported. To handle viewpoint changes, we perform a detailed analysis and manipulation of the 3D information in the video. Specifically, we utilize a dense stereo method to estimate a point cloud and the camera parameters of the input video. We then propose a point cloud editing approach that uses depth maps to represent the 3D geometry of newly edited components, aligning them effectively with the original 3D scene. To seamlessly merge the newly edited content with the original video while preserving the features of unedited regions, we introduce a 3D-aware mask propagation strategy and employ a video diffusion model to produce realistic edited videos. Extensive experiments demonstrate the superiority of Sketch3DVE in video editing. Homepage and code: http://http://geometrylearning.com/Sketch3DVE/

Sketch3DVE: スケッチベースの3D対応シーンビデオ編集

Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

要旨

Support