ChatPaper.aiChatPaper

IC-Effect:透過情境學習實現精確高效的影片特效編輯

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

December 17, 2025
作者: Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao
cs.AI

摘要

我們提出IC-Effect——一個基於DiT的指令引導式少樣本影片視覺特效編輯框架,能合成複雜特效(如火焰、粒子與動畫角色),同時嚴格保持空間與時間一致性。影片視覺特效編輯極具挑戰性,因為注入的特效必須與背景無縫融合、背景需完全保持原狀,且特效模式需從有限的配對資料中高效學習。然而,現有影片編輯模型均無法滿足這些要求。IC-Effect將原始影片作為乾淨的上下文條件,利用DiT模型的上下文學習能力,實現精準的背景保留與自然的特效注入。透過兩階段訓練策略(先進行通用編輯適應,再經由Effect-LoRA進行特效專項學習),確保模型具備強指令遵循能力與穩健的特效建模效果。為進一步提升效率,我們引入時空稀疏標記化技術,以大幅降低計算量的同時實現高擬真度。我們還發布了涵蓋15種高品質視覺風格的配對視覺特效編輯資料集。大量實驗表明,IC-Effect能提供高品質、可控且時序一致的視覺特效編輯,為影片創作開闢新可能。
English
We propose IC-Effect, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal sparse tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning 15 high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.
PDF192December 19, 2025