全效视觉：统一且空间可控的视觉特效生成

摘要

视觉特效（VFX）是现代电影制作中不可或缺的视觉增强手段。尽管视频生成模型为VFX制作提供了成本效益高的解决方案，但现有方法受限于针对单一特效的LoRA训练，这限制了生成仅限于单一特效的能力。这一根本性限制阻碍了需要空间可控复合特效的应用，即在指定位置同时生成多种特效。然而，将多样特效整合到一个统一框架中面临重大挑战：特效变化带来的干扰以及多VFX联合训练中的空间不可控性。为解决这些挑战，我们提出了Omni-Effects，首个能够生成提示引导特效及空间可控复合特效的统一框架。该框架的核心包含两项关键创新：(1)基于LoRA的专家混合模型（LoRA-MoE），采用一组专家LoRA，在统一模型中整合多样特效，有效减轻跨任务干扰。(2)空间感知提示（SAP）将空间掩码信息融入文本标记，实现精确的空间控制。此外，我们在SAP中引入了独立信息流（IIF）模块，隔离各特效对应的控制信号，防止不希望的混合。为推进此项研究，我们通过结合图像编辑与首尾帧到视频（FLF2V）合成的新颖数据收集流程，构建了全面的VFX数据集Omni-VFX，并引入专门的VFX评估框架以验证模型性能。大量实验证明，Omni-Effects实现了精确的空间控制与多样特效生成，使用户能够指定所需特效的类别及位置。

English

Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.

全效视觉：统一且空间可控的视觉特效生成

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

摘要

Support