ChatPaper.aiChatPaper

VFXMaster:通过上下文学习解锁动态视觉特效生成

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

October 29, 2025
作者: Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
cs.AI

摘要

视觉特效(VFX)对数字媒体的表现力至关重要,但其创作仍是生成式AI面临的重大挑战。主流方法通常采用"单一特效对应单一LoRA"的范式,这种模式不仅资源消耗大,且本质上无法泛化至未见过的特效,从而限制了可扩展性与创作空间。为解决这一难题,我们提出首个基于参考视频的统一框架VFXMaster,将特效生成重构为上下文学习任务,使其能够将参考视频中的多样化动态效果复现至目标内容。该框架还展现出对未知特效类别的卓越泛化能力。具体而言,我们设计了上下文条件策略,通过参考示例对模型进行提示;同时开发了上下文注意力掩码,可精准解耦并注入核心特效属性,使单一统一模型在避免信息泄露的前提下掌握特效模仿能力。此外,我们提出高效的单样本特效自适应机制,能基于用户提供的单个视频快速提升对高难度未知特效的泛化能力。大量实验表明,本方法能有效模仿多类别特效信息,并对领域外特效表现出优异的泛化性能。为推动后续研究,我们将向社区公开代码、模型及完整数据集。
English
Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.
PDF321December 2, 2025