VACE:全方位影片創作與編輯平台
VACE: All-in-One Video Creation and Editing
March 10, 2025
作者: Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, Yu Liu
cs.AI
摘要
擴散變換器(Diffusion Transformer)在生成高質量圖像和視頻方面展現了強大的能力和可擴展性。進一步追求生成與編輯任務的統一,已在圖像內容創作領域取得了顯著進展。然而,由於對時空動態一致性的內在需求,實現視頻合成的統一方法仍然具有挑戰性。我們引入了VACE,它使用戶能夠在一個全功能框架內執行視頻任務,涵蓋創建與編輯。這些任務包括參考到視頻生成、視頻到視頻編輯以及遮罩視頻到視頻編輯。具體而言,我們通過將視頻任務輸入(如編輯、參考和遮罩)組織成一個統一界面,即視頻條件單元(Video Condition Unit, VCU),有效地整合了各類任務的需求。此外,通過利用上下文適配器(Context Adapter)結構,我們使用時空維度的形式化表示將不同任務概念注入模型,使其能夠靈活處理任意視頻合成任務。大量實驗表明,VACE的統一模型在各種子任務上達到了與特定任務模型相當的性能。同時,它通過多樣化的任務組合實現了廣泛的應用。項目頁面:https://ali-vilab.github.io/VACE-Page/。
English
Diffusion Transformer has demonstrated powerful capability and scalability in
generating high-quality images and videos. Further pursuing the unification of
generation and editing tasks has yielded significant progress in the domain of
image content creation. However, due to the intrinsic demands for consistency
across both temporal and spatial dynamics, achieving a unified approach for
video synthesis remains challenging. We introduce VACE, which enables users to
perform Video tasks within an All-in-one framework for Creation and Editing.
These tasks include reference-to-video generation, video-to-video editing, and
masked video-to-video editing. Specifically, we effectively integrate the
requirements of various tasks by organizing video task inputs, such as editing,
reference, and masking, into a unified interface referred to as the Video
Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we
inject different task concepts into the model using formalized representations
of temporal and spatial dimensions, allowing it to handle arbitrary video
synthesis tasks flexibly. Extensive experiments demonstrate that the unified
model of VACE achieves performance on par with task-specific models across
various subtasks. Simultaneously, it enables diverse applications through
versatile task combinations. Project page:
https://ali-vilab.github.io/VACE-Page/.Summary
AI-Generated Summary