ChatPaper.aiChatPaper

面向人像视频编辑的上下文同步LoRA技术

In-Context Sync-LoRA for Portrait Video Editing

December 2, 2025
作者: Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or
cs.AI

摘要

人像视频编辑是一项具有挑战性的任务,需要对诸如外貌调整、表情修改或物体添加等广泛改动实现灵活而精准的控制。其核心难点在于保持主体原有的时序行为,要求每一帧编辑结果都与源视频帧保持精确同步。我们提出Sync-LoRA方法,在实现高质量视觉修改的同时,能够保持帧级同步精度与身份一致性。该方法基于图像到视频的扩散模型,通过修改首帧定义编辑效果,并将其传播至整个序列。为实现精准同步,我们使用描绘相同运动轨迹但外观各异的配对视频训练上下文LoRA模型。这些配对数据通过基于同步性的筛选流程自动生成和优化,仅选取时序对齐度最高的样本进行训练。该训练方案使模型能够将源视频的运动线索与编辑首帧引入的视觉变化相结合。通过在精挑细选的同步人像数据集上训练,Sync-LoRA可泛化至未知身份与多样编辑场景(如外貌修改、物体添加或背景变更),稳健处理姿态与表情变化。实验结果表明,该方法在视觉保真度和时序连贯性方面表现优异,实现了编辑精度与运动保持的稳健平衡。
English
Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.
PDF11December 4, 2025