通过分解扩散蒸馏进行视频编辑。
Video Editing via Factorized Diffusion Distillation
March 14, 2024
作者: Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman
cs.AI
摘要
我们介绍Emu Video Edit(EVE),这是一个在视频编辑领域取得了新的技术突破的模型,而且无需依赖任何监督式视频编辑数据。为了开发EVE,我们分别训练了一个图像编辑适配器和一个视频生成适配器,并将两者附加到同一个文本到图像模型上。然后,为了使适配器朝向视频编辑方向对齐,我们引入了一种新的无监督蒸馏过程,即分解扩散蒸馏。该过程可以同时从一个或多个教师那里蒸馏知识,而无需任何监督数据。我们利用这个过程来教导EVE编辑视频,通过联合蒸馏知识来(i)精确编辑来自图像编辑适配器的每个单独帧,以及(ii)确保通过视频生成适配器编辑的帧之间的时间一致性。最后,为了展示我们的方法在释放其他能力方面的潜力,我们对适配器的额外组合进行了调整。
English
We introduce Emu Video Edit (EVE), a model that establishes a new
state-of-the art in video editing without relying on any supervised video
editing data. To develop EVE we separately train an image editing adapter and a
video generation adapter, and attach both to the same text-to-image model.
Then, to align the adapters towards video editing we introduce a new
unsupervised distillation procedure, Factorized Diffusion Distillation. This
procedure distills knowledge from one or more teachers simultaneously, without
any supervised data. We utilize this procedure to teach EVE to edit videos by
jointly distilling knowledge to (i) precisely edit each individual frame from
the image editing adapter, and (ii) ensure temporal consistency among the
edited frames using the video generation adapter. Finally, to demonstrate the
potential of our approach in unlocking other capabilities, we align additional
combinations of adaptersSummary
AI-Generated Summary