全效應：統一且空間可控的視覺效果生成

摘要

視覺特效（VFX）是現代電影製作中不可或缺的視覺增強手段。儘管視頻生成模型為VFX製作提供了成本效益高的解決方案，但現有方法受限於針對單一特效的LoRA訓練，這限制了生成僅能應用於單一特效。這一根本性限制阻礙了需要空間可控複合特效的應用，即在指定位置同時生成多種特效。然而，將多樣特效整合到統一框架中面臨重大挑戰：特效變化的干擾以及多VFX聯合訓練中的空間不可控性。為應對這些挑戰，我們提出了Omni-Effects，這是首個能夠生成提示引導特效及空間可控複合特效的統一框架。我們框架的核心包含兩項關鍵創新：(1) 基於LoRA的專家混合模型（LoRA-MoE），它採用一組專家LoRA，在統一模型中整合多樣特效，同時有效減輕跨任務干擾。(2) 空間感知提示（SAP）將空間掩碼信息融入文本標記，實現精確的空間控制。此外，我們在SAP中引入了獨立信息流（IIF）模塊，隔離與各特效對應的控制信號，防止任何不期望的混合。為推動此研究，我們通過結合圖像編輯與首尾幀到視頻（FLF2V）合成的新穎數據收集管道，構建了全面的VFX數據集Omni-VFX，並引入專用的VFX評估框架以驗證模型性能。大量實驗證明，Omni-Effects實現了精確的空間控制與多樣特效生成，使用戶能夠指定所需特效的類別及位置。

English

Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.

全效應：統一且空間可控的視覺效果生成

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

摘要

Support