全效视觉:统一且空间可控的视觉特效生成
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
August 11, 2025
作者: Fangyuan Mao, Aiming Hao, Jintao Chen, Dongxia Liu, Xiaokun Feng, Jiashu Zhu, Meiqi Wu, Chubin Chen, Jiahong Wu, Xiangxiang Chu
cs.AI
摘要
视觉特效(VFX)是现代电影制作中不可或缺的视觉增强手段。尽管视频生成模型为VFX制作提供了成本效益高的解决方案,但现有方法受限于针对单一特效的LoRA训练,这限制了生成仅限于单一特效的能力。这一根本性限制阻碍了需要空间可控复合特效的应用,即在指定位置同时生成多种特效。然而,将多样特效整合到一个统一框架中面临重大挑战:特效变化带来的干扰以及多VFX联合训练中的空间不可控性。为解决这些挑战,我们提出了Omni-Effects,首个能够生成提示引导特效及空间可控复合特效的统一框架。该框架的核心包含两项关键创新:(1)基于LoRA的专家混合模型(LoRA-MoE),采用一组专家LoRA,在统一模型中整合多样特效,有效减轻跨任务干扰。(2)空间感知提示(SAP)将空间掩码信息融入文本标记,实现精确的空间控制。此外,我们在SAP中引入了独立信息流(IIF)模块,隔离各特效对应的控制信号,防止不希望的混合。为推进此项研究,我们通过结合图像编辑与首尾帧到视频(FLF2V)合成的新颖数据收集流程,构建了全面的VFX数据集Omni-VFX,并引入专门的VFX评估框架以验证模型性能。大量实验证明,Omni-Effects实现了精确的空间控制与多样特效生成,使用户能够指定所需特效的类别及位置。
English
Visual effects (VFX) are essential visual enhancements fundamental to modern
cinematic production. Although video generation models offer cost-efficient
solutions for VFX production, current methods are constrained by per-effect
LoRA training, which limits generation to single effects. This fundamental
limitation impedes applications that require spatially controllable composite
effects, i.e., the concurrent generation of multiple effects at designated
locations. However, integrating diverse effects into a unified framework faces
major challenges: interference from effect variations and spatial
uncontrollability during multi-VFX joint training. To tackle these challenges,
we propose Omni-Effects, a first unified framework capable of generating
prompt-guided effects and spatially controllable composite effects. The core of
our framework comprises two key innovations: (1) LoRA-based Mixture of Experts
(LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects
within a unified model while effectively mitigating cross-task interference.
(2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the
text token, enabling precise spatial control. Furthermore, we introduce an
Independent-Information Flow (IIF) module integrated within the SAP, isolating
the control signals corresponding to individual effects to prevent any unwanted
blending. To facilitate this research, we construct a comprehensive VFX dataset
Omni-VFX via a novel data collection pipeline combining image editing and
First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX
evaluation framework for validating model performance. Extensive experiments
demonstrate that Omni-Effects achieves precise spatial control and diverse
effect generation, enabling users to specify both the category and location of
desired effects.