透過分解擴散蒸餾進行影片編輯

摘要

我們介紹 Emu Video Edit（EVE），這是一個在影片編輯領域創立了新的技術水準，並且不依賴任何監督式影片編輯數據的模型。為了開發EVE，我們分別訓練了一個圖像編輯適配器和一個影片生成適配器，並將兩者附加到同一個文本到圖像模型上。然後，為了將適配器對準影片編輯，我們引入了一個新的無監督蒸餾程序，稱為分解擴散蒸餾。這個程序可以從一個或多個教師那裡同時蒸餾知識，而無需任何監督數據。我們利用這個程序來教導EVE如何編輯影片，通過共同蒸餾知識來（i）精確編輯來自圖像編輯適配器的每個單獨幀，以及（ii）確保使用影片生成適配器在編輯幀之間保持時間一致性。最後，為了展示我們方法在開啟其他功能方面的潛力，我們對進一步的適配器組合進行了對齊。

English

We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure, Factorized Diffusion Distillation. This procedure distills knowledge from one or more teachers simultaneously, without any supervised data. We utilize this procedure to teach EVE to edit videos by jointly distilling knowledge to (i) precisely edit each individual frame from the image editing adapter, and (ii) ensure temporal consistency among the edited frames using the video generation adapter. Finally, to demonstrate the potential of our approach in unlocking other capabilities, we align additional combinations of adapters

透過分解擴散蒸餾進行影片編輯

Video Editing via Factorized Diffusion Distillation

摘要

Support