VFXMaster:透過情境學習解鎖動態視覺效果生成
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
October 29, 2025
作者: Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
cs.AI
摘要
視覺特效(VFX)是數位媒體表現力的核心要素,但其創作對生成式人工智慧仍是重大挑戰。現有方法多遵循「單一特效對應單一LoRA」的範式,不僅耗費資源,且本質上無法泛化至未見過的特效類型,從而限制了可擴展性與創作空間。為解決此難題,我們提出首個基於參考影片的統一框架VFXMaster,將特效生成重新定義為情境學習任務,使其能將參考影片中的多樣動態特效遷移至目標內容。該框架還展現出對未見過特效類別的卓越泛化能力。具體而言,我們設計了情境條件提示策略,透過參考樣本引導模型學習;並開發情境注意力遮罩機制,精確解耦與注入關鍵特效屬性,使單一統一模型能無訊息洩漏地掌握特效模仿。此外,我們提出高效單次特效適應機制,可基於使用者提供的單支影片快速提升對高難度未見過特效的泛化能力。大量實驗表明,本方法能有效模仿多類特效資訊,並對領域外特效表現出優異的泛化性能。為推動後續研究,我們將向學界公開程式碼、模型及完整資料集。
English
Visual effects (VFX) are crucial to the expressive power of digital media,
yet their creation remains a major challenge for generative AI. Prevailing
methods often rely on the one-LoRA-per-effect paradigm, which is
resource-intensive and fundamentally incapable of generalizing to unseen
effects, thus limiting scalability and creation. To address this challenge, we
introduce VFXMaster, the first unified, reference-based framework for VFX video
generation. It recasts effect generation as an in-context learning task,
enabling it to reproduce diverse dynamic effects from a reference video onto
target content. In addition, it demonstrates remarkable generalization to
unseen effect categories. Specifically, we design an in-context conditioning
strategy that prompts the model with a reference example. An in-context
attention mask is designed to precisely decouple and inject the essential
effect attributes, allowing a single unified model to master the effect
imitation without information leakage. In addition, we propose an efficient
one-shot effect adaptation mechanism to boost generalization capability on
tough unseen effects from a single user-provided video rapidly. Extensive
experiments demonstrate that our method effectively imitates various categories
of effect information and exhibits outstanding generalization to out-of-domain
effects. To foster future research, we will release our code, models, and a
comprehensive dataset to the community.