VACE: 올인원 비디오 제작 및 편집

초록

Diffusion Transformer는 고품질 이미지와 비디오 생성에서 강력한 능력과 확장성을 입증했습니다. 생성과 편집 작업의 통합을 더욱 추구함으로써 이미지 콘텐츠 제작 분야에서 상당한 진전을 이루었습니다. 그러나 시간적 및 공간적 동역학에 걸친 일관성에 대한 본질적인 요구로 인해 비디오 합성을 위한 통합 접근 방식을 달성하는 것은 여전히 어려운 과제로 남아 있습니다. 우리는 VACE를 소개하며, 이를 통해 사용자가 비디오 생성과 편집을 위한 올인원 프레임워크 내에서 다양한 비디오 작업을 수행할 수 있도록 합니다. 이러한 작업에는 참조 비디오 생성, 비디오 편집, 그리고 마스크 비디오 편집이 포함됩니다. 특히, 우리는 편집, 참조, 마스킹과 같은 비디오 작업 입력을 Video Condition Unit(VCU)이라는 통합 인터페이스로 구성하여 다양한 작업의 요구 사항을 효과적으로 통합합니다. 더 나아가, Context Adapter 구조를 활용하여 시간적 및 공간적 차원의 형식화된 표현을 통해 다양한 작업 개념을 모델에 주입함으로써 임의의 비디오 합성 작업을 유연하게 처리할 수 있도록 합니다. 광범위한 실험을 통해 VACE의 통합 모델이 다양한 하위 작업에서 작업별 모델과 동등한 성능을 달성함을 입증했습니다. 동시에, 다양한 작업 조합을 통해 다양한 응용 프로그램을 가능하게 합니다. 프로젝트 페이지: https://ali-vilab.github.io/VACE-Page/.

English

Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.