IC-Effect:基于上下文学习的精准高效视频特效编辑
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
December 17, 2025
作者: Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao
cs.AI
摘要
我们提出IC-Effect——一种基于DiT的指令引导式少样本视频特效编辑框架,能够合成复杂特效(如火焰、粒子与卡通角色),同时严格保持时空一致性。视频特效编辑面临巨大挑战:注入的特效需与背景无缝融合,背景必须完全保持不变,且需从有限的配对数据中高效学习特效模式。然而现有视频编辑模型均无法满足这些要求。IC-Effect将源视频作为纯净上下文条件,利用DiT模型的上下文学习能力实现精准背景保留与自然特效注入。通过两阶段训练策略(先进行通用编辑适配,再通过Effect-LoRA进行特效专项学习),确保模型具备强指令遵循能力与鲁棒的特效建模效果。为提升效率,我们引入时空稀疏标记化技术,在显著降低计算量的同时实现高保真度。此外还发布了涵盖15种高质量视觉风格的配对特效编辑数据集。大量实验表明,IC-Effect能实现高质量、可控且时序一致的特效编辑,为视频创作开辟了新可能。
English
We propose IC-Effect, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal sparse tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning 15 high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.