ChatPaper.aiChatPaper

FFP-300K:面向可泛化视频编辑的首帧传播规模化框架

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

January 5, 2026
作者: Xijie Huang, Chengming Xu, Donghao Luo, Xiaobin Hu, Peng Tang, Xu Peng, Jiangning Zhang, Chengjie Wang, Yanwei Fu
cs.AI

摘要

首帧传播(FFP)为可控视频编辑提供了前景广阔的新范式,但现有方法受限于对繁琐运行时引导的依赖。我们发现这一局限的根本原因在于当前训练数据集的不足——其往往存在时长过短、分辨率低下且缺乏任务多样性的问题,难以支撑鲁棒时序先验的学习。为填补这一基础性数据空白,我们首先提出FFP-300K数据集,该大规模数据集通过双轨制构建流程生成72万对720p分辨率、81帧长的高保真视频对,涵盖多样化的局部与全局编辑任务。基于此数据集,我们设计出真正无需引导的FFP新框架,有效化解了保持首帧外观与维持源视频运动之间的核心矛盾。在架构层面,我们提出自适应时空旋转位置编码(AST-RoPE),通过动态重映射位置编码实现外观与运动参考的解耦;在目标层面,采用以身份传播任务作为强正则子的自蒸馏策略,确保长期时序稳定性并防止语义漂移。在EditVerseBench基准上的综合实验表明,本方法以约0.2分PickScore和0.3分VLM得分的优势显著超越现有学术及商业模型。
English
First-Frame Propagation (FFP) offers a promising paradigm for controllable video editing, but existing methods are hampered by a reliance on cumbersome run-time guidance. We identify the root cause of this limitation as the inadequacy of current training datasets, which are often too short, low-resolution, and lack the task diversity required to teach robust temporal priors. To address this foundational data gap, we first introduce FFP-300K, a new large-scale dataset comprising 300K high-fidelity video pairs at 720p resolution and 81 frames in length, constructed via a principled two-track pipeline for diverse local and global edits. Building on this dataset, we propose a novel framework designed for true guidance-free FFP that resolves the critical tension between maintaining first-frame appearance and preserving source video motion. Architecturally, we introduce Adaptive Spatio-Temporal RoPE (AST-RoPE), which dynamically remaps positional encodings to disentangle appearance and motion references. At the objective level, we employ a self-distillation strategy where an identity propagation task acts as a powerful regularizer, ensuring long-term temporal stability and preventing semantic drift. Comprehensive experiments on the EditVerseBench benchmark demonstrate that our method significantly outperforming existing academic and commercial models by receiving about 0.2 PickScore and 0.3 VLM score improvement against these competitors.
PDF21January 8, 2026