ChatPaper.aiChatPaper

**背景同步LoRA:面向肖像视频编辑的创新技术**

In-Context Sync-LoRA for Portrait Video Editing

December 2, 2025
作者: Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or
cs.AI

摘要

人像影片編輯是一項極具挑戰性的任務,需要對外觀調整、表情修改或物件添加等多種編輯操作進行靈活而精準的控制。其核心難點在於保持主體原有的時間動態特徵,要求每個編輯後的影格都能與原始影格實現精確同步。本文提出Sync-LoRA方法,通過基於圖像到影片的擴散模型實現高質量視覺修改,同時確保影格級同步精度與身份一致性。該方法首先對首影格進行編輯定義,隨後將修改傳播至整個序列。為實現精確同步,我們採用描繪相同運動軌跡但外觀差異的配對影片,通過同步化過濾流程自動生成並篩選出時序對齊度最高的訓練樣本,據此訓練上下文自適應的LoRA模型。這種訓練機制使模型能融合源影片的運動特徵與編輯首影格的視覺變化。在經過嚴格篩選的同步人像數據集上訓練後,Sync-LoRA可泛化至未見過的身份與多樣化編輯任務(如外觀修改、物件添加或背景替換),並能穩健處理姿態與表情的變化。實驗結果顯示,該方法在保持編輯保真度與精確運動特徵之間達到優異平衡,呈現出高視覺真實度與強時序連貫性。
English
Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.
PDF11December 4, 2025