MagicProp：基於擴散的視頻編輯，透過運動感知外觀傳播

摘要

本文討論在保留影片運動的同時修改影片外觀的問題。提出了一個名為 MagicProp 的新框架，將影片編輯過程分為兩個階段：外觀編輯和運動感知外觀傳播。在第一階段，MagicProp 從輸入影片中選擇一個單一幀，並應用圖像編輯技術來修改幀的內容和/或風格。這些技術的靈活性使得可以編輯幀內的任意區域。在第二階段，MagicProp 使用編輯後的幀作為外觀參考，並使用自回歸渲染方法生成其餘幀。為了實現這一目標，開發了一種基於擴散的條件生成模型，稱為 PropDPM，通過在參考外觀、目標運動以及先前外觀的條件下合成目標幀。自回歸編輯方法確保了生成影片的時間一致性。總的來說，MagicProp 結合了圖像編輯技術的靈活性和自回歸建模的優越時間一致性，實現了對輸入影片中任意區域的物件類型和美學風格進行靈活編輯，同時在幀之間保持良好的時間一致性。在各種影片編輯場景中進行了大量實驗，證明了 MagicProp 的有效性。

English

This paper addresses the issue of modifying the visual appearance of videos while preserving their motion. A novel framework, named MagicProp, is proposed, which disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame. The flexibility of these techniques enables the editing of arbitrary regions within the frame. In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach. To achieve this, a diffusion-based conditional generation model, called PropDPM, is developed, which synthesizes the target frame by conditioning on the reference appearance, the target motion, and its previous appearance. The autoregressive editing approach ensures temporal consistency in the resulting videos. Overall, MagicProp combines the flexibility of image-editing techniques with the superior temporal consistency of autoregressive modeling, enabling flexible editing of object types and aesthetic styles in arbitrary regions of input videos while maintaining good temporal consistency across frames. Extensive experiments in various video editing scenarios demonstrate the effectiveness of MagicProp.

MagicProp：基於擴散的視頻編輯，透過運動感知外觀傳播

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

摘要

Support