EffectMaker：推理与生成统一的自定义视觉特效创作框架

摘要

视觉特效（VFX）对于提升视频内容的表现力与创意至关重要，但高质量特效制作通常需要专业知识及昂贵的生产流程。现有AIGC系统在特效生成领域面临重大挑战：特效专用数据稀缺，且超自然或风格化效果本身存在建模难度。此外，这些方法常需针对每种特效进行微调，严重限制了其对新特效的扩展性与泛化能力。本文提出EffectMaker——一个支持基于参考视频的特效定制化统一推理生成框架。该框架采用多模态大语言模型解析高级别特效语义并推理其如何适配目标主体，同时利用扩散变换器通过上下文学习从参考视频中捕捉细粒度视觉线索。二者构成语义-视觉双路径引导机制，无需逐特效微调即可实现精准、可控且效果一致的合成。我们还构建了EffectData数据集，这是目前规模最大、质量最高的合成数据集，涵盖3000种特效类别的13万条视频，以提升泛化性与可扩展性。实验表明，EffectMaker在视觉质量与特效一致性上均优于现有先进基线，为定制化特效生成提供了可扩展的灵活范式。项目页面：https://effectmaker.github.io

English

Visual effects (VFX) are essential for enhancing the expressiveness and creativity of video content, yet producing high-quality effects typically requires expert knowledge and costly production pipelines. Existing AIGC systems face significant challenges in VFX generation due to the scarcity of effect-specific data and the inherent difficulty of modeling supernatural or stylized effects. Moreover, these approaches often require per-effect fine-tuning, which severely limits their scalability and generalization to novel VFX. In this work, we present EffectMaker, a unified reasoning-generation framework that enables reference-based VFX customization. EffectMaker employs a multimodal large language model to interpret high-level effect semantics and reason about how they should adapt to a target subject, while a diffusion transformer leverages in-context learning to capture fine-grained visual cues from reference videos. These two components form a semantic-visual dual-path guidance mechanism that enables accurate, controllable, and effect-consistent synthesis without per-effect fine-tuning. Furthermore, we construct EffectData, the largest high-quality synthetic dataset containing 130k videos across 3k VFX categories, to improve generalization and scalability. Experiments show that EffectMaker achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation. Project page: https://effectmaker.github.io

EffectMaker：推理与生成统一的自定义视觉特效创作框架

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

摘要

Support