ChatPaper.aiChatPaper

PickStyle:基於上下文風格適配器的視頻到視頻風格遷移

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

October 8, 2025
作者: Soroush Mehraban, Vida Adeli, Jacob Rommann, Babak Taati, Kyryl Truskovskyi
cs.AI

摘要

我們針對視頻風格遷移任務,採用擴散模型進行研究,其目標是在保持輸入視頻內容的同時,根據文本提示指定的目標風格進行渲染。此任務面臨的主要挑戰是缺乏配對的視頻數據以供監督。我們提出了PickStyle,這是一個視頻到視頻的風格遷移框架,它通過風格適配器增強了預訓練的視頻擴散模型骨幹,並利用具有源風格對應關係的配對靜態圖像數據進行訓練。PickStyle在條件模塊的自注意力層中插入了低秩適配器,從而實現了對運動風格遷移的高效專門化,同時保持了視頻內容與風格之間的強對齊。為了彌補靜態圖像監督與動態視頻之間的差距,我們通過應用模擬相機運動的共享增強來從配對圖像構建合成訓練片段,確保時間先驗得以保留。此外,我們引入了上下文風格無分類器指導(CS-CFG),這是一種將無分類器指導新穎地分解為獨立文本(風格)和視頻(上下文)方向的方法。CS-CFG確保了生成視頻中上下文的保留,同時有效地轉移了風格。跨基準測試的實驗表明,我們的方法實現了時間上連貫、風格忠實且內容保留的視頻轉換,在質量和數量上均優於現有的基線方法。
English
We address the task of video style transfer with diffusion models, where the goal is to preserve the context of an input video while rendering it in a target style specified by a text prompt. A major challenge is the lack of paired video data for supervision. We propose PickStyle, a video-to-video style transfer framework that augments pretrained video diffusion backbones with style adapters and benefits from paired still image data with source-style correspondences for training. PickStyle inserts low-rank adapters into the self-attention layers of conditioning modules, enabling efficient specialization for motion-style transfer while maintaining strong alignment between video content and style. To bridge the gap between static image supervision and dynamic video, we construct synthetic training clips from paired images by applying shared augmentations that simulate camera motion, ensuring temporal priors are preserved. In addition, we introduce Context-Style Classifier-Free Guidance (CS-CFG), a novel factorization of classifier-free guidance into independent text (style) and video (context) directions. CS-CFG ensures that context is preserved in generated video while the style is effectively transferred. Experiments across benchmarks show that our approach achieves temporally coherent, style-faithful, and content-preserving video translations, outperforming existing baselines both qualitatively and quantitatively.
PDF162October 10, 2025