옴니 이펙트: 통합적이고 공간 제어 가능한 시각 효과 생성

초록

시각 효과(VFX)는 현대 영화 제작에 필수적인 시각적 향상 요소입니다. 비디오 생성 모델은 VFX 제작에 비용 효율적인 솔루션을 제공하지만, 현재의 방법들은 효과별 LoRA 학습에 제한되어 단일 효과만 생성할 수 있다는 한계가 있습니다. 이러한 근본적인 한계는 공간적으로 제어 가능한 복합 효과, 즉 지정된 위치에서 여러 효과를 동시에 생성해야 하는 응용 분야를 방해합니다. 그러나 다양한 효과를 통합된 프레임워크로 통합하는 데는 주요한 도전 과제가 있습니다: 효과 변형으로 인한 간섭과 다중 VFX 공동 학습 중의 공간적 비제어성입니다. 이러한 도전 과제를 해결하기 위해, 우리는 프롬프트 기반 효과와 공간적으로 제어 가능한 복합 효과를 생성할 수 있는 최초의 통합 프레임워크인 Omni-Effects를 제안합니다. 우리 프레임워크의 핵심은 두 가지 주요 혁신으로 구성됩니다: (1) LoRA 기반 전문가 혼합(LoRA-MoE)은 전문가 LoRA 그룹을 사용하여 다양한 효과를 통합된 모델 내에서 통합하면서 교차 작업 간섭을 효과적으로 완화합니다. (2) 공간 인식 프롬프트(SAP)는 공간 마스크 정보를 텍스트 토큰에 통합하여 정밀한 공간 제어를 가능하게 합니다. 또한, 우리는 SAP 내에 통합된 독립 정보 흐름(IIF) 모듈을 도입하여 개별 효과에 해당하는 제어 신호를 분리하여 원치 않는 혼합을 방지합니다. 이 연구를 촉진하기 위해, 우리는 이미지 편집과 First-Last Frame-to-Video(FLF2V) 합성을 결합한 새로운 데이터 수집 파이프라인을 통해 포괄적인 VFX 데이터셋 Omni-VFX를 구축하고, 모델 성능을 검증하기 위한 전용 VFX 평가 프레임워크를 소개합니다. 광범위한 실험을 통해 Omni-Effects가 정밀한 공간 제어와 다양한 효과 생성을 달성하여 사용자가 원하는 효과의 카테고리와 위치를 모두 지정할 수 있음을 입증합니다.

English

Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.

옴니 이펙트: 통합적이고 공간 제어 가능한 시각 효과 생성

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

초록

Support