オムニエフェクト：統合された空間制御可能な視覚効果生成

要旨

ビジュアルエフェクト（VFX）は、現代の映画制作において不可欠な視覚的強化要素です。ビデオ生成モデルはVFX制作においてコスト効率の良いソリューションを提供しますが、現在の手法はエフェクトごとのLoRAトレーニングに制約されており、単一のエフェクト生成に限定されています。この根本的な制約は、空間的に制御可能な複合エフェクト、すなわち指定された位置で複数のエフェクトを同時に生成する必要があるアプリケーションを妨げています。しかし、多様なエフェクトを統一的なフレームワークに統合するには、エフェクトのバリエーションによる干渉や、複数VFXの共同トレーニング中の空間的制御不能性といった大きな課題があります。これらの課題に対処するため、我々はプロンプト誘導型エフェクトと空間的に制御可能な複合エフェクトを生成可能な初の統一フレームワークであるOmni-Effectsを提案します。このフレームワークの中核は、以下の2つの主要なイノベーションで構成されています：(1) LoRAベースのMixture of Experts（LoRA-MoE）は、専門家LoRAのグループを採用し、多様なエフェクトを統一モデル内に統合しながら、タスク間の干渉を効果的に軽減します。(2) Spatial-Aware Prompt（SAP）は、空間マスク情報をテキストトークンに組み込み、精密な空間制御を可能にします。さらに、SAP内に統合されたIndependent-Information Flow（IIF）モジュールを導入し、個々のエフェクトに対応する制御信号を分離して、望まないブレンドを防ぎます。この研究を促進するため、画像編集とFirst-Last Frame-to-Video（FLF2V）合成を組み合わせた新しいデータ収集パイプラインを通じて包括的なVFXデータセットOmni-VFXを構築し、モデルの性能を検証するための専用のVFX評価フレームワークを導入しました。広範な実験により、Omni-Effectsが精密な空間制御と多様なエフェクト生成を実現し、ユーザーが希望するエフェクトのカテゴリと位置を指定できることが示されました。

English

Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.

オムニエフェクト：統合された空間制御可能な視覚効果生成

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

要旨

Support