EffectMaker：推論と生成を統合したカスタム視覚効果作成システム

要旨

視覚効果（VFX）は映像コンテンツの表現力と創造性を高める上で不可欠であるが、高品質な効果の制作には通常、専門知識と高額な制作パイプラインが必要となる。既存のAIGCシステムは、効果特有のデータ不足や、超自然的または様式化された効果をモデリングする本質的な難しさから、VFX生成において重大な課題に直面している。さらに、これらのアプローチは効果ごとのファインチューニングを必要とすることが多く、新規VFXへの拡張性と一般化を大幅に制限している。本研究では、参照ベースのVFXカスタマイズを可能にする統一的な推論・生成フレームワークであるEffectMakerを提案する。EffectMakerは、マルチモーダル大規模言語モデルを用いて高水準の効果意味を解釈し、それらが対象被写体にどのように適応すべきかを推論する。一方、Diffusion Transformerはコンテキスト内学習を活用して参照動画から細かな視覚的手がかりを捕捉する。これら2つのコンポーネントが、意味的・視覚的双方向ガイダンス機構を形成し、効果ごとのファインチューニングなしで、正確で制御可能かつ効果に一貫した合成を実現する。さらに、一般化と拡張性を向上させるため、3,000のVFXカテゴリにわたる13万本の動画を含む最大級の高品質合成データセット「EffectData」を構築した。実験により、EffectMakerが最先端のベースラインを上回る視覚的品質と効果の一貫性を達成し、カスタマイズされたVFX生成のための拡張性と柔軟性を備えたパラダイムを提供することを示す。プロジェクトページ: https://effectmaker.github.io

English

Visual effects (VFX) are essential for enhancing the expressiveness and creativity of video content, yet producing high-quality effects typically requires expert knowledge and costly production pipelines. Existing AIGC systems face significant challenges in VFX generation due to the scarcity of effect-specific data and the inherent difficulty of modeling supernatural or stylized effects. Moreover, these approaches often require per-effect fine-tuning, which severely limits their scalability and generalization to novel VFX. In this work, we present EffectMaker, a unified reasoning-generation framework that enables reference-based VFX customization. EffectMaker employs a multimodal large language model to interpret high-level effect semantics and reason about how they should adapt to a target subject, while a diffusion transformer leverages in-context learning to capture fine-grained visual cues from reference videos. These two components form a semantic-visual dual-path guidance mechanism that enables accurate, controllable, and effect-consistent synthesis without per-effect fine-tuning. Furthermore, we construct EffectData, the largest high-quality synthetic dataset containing 130k videos across 3k VFX categories, to improve generalization and scalability. Experiments show that EffectMaker achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation. Project page: https://effectmaker.github.io

EffectMaker：推論と生成を統合したカスタム視覚効果作成システム

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

要旨

Support