DreamStyle:影片風格化統一框架
DreamStyle: A Unified Framework for Video Stylization
January 6, 2026
作者: Mengtian Li, Jinshu Chen, Songtao Zhao, Wanquan Feng, Pengqi Tu, Qian He
cs.AI
摘要
影片風格化作為影片生成模型的重要下游任務,目前尚未得到充分探索。其輸入風格條件通常包含文字、風格圖像與風格化首幀三種類型,每種條件各具優勢:文字描述更具靈活性,風格圖像能提供精確的視覺錨點,而風格化首幀則使長影片風格化成為可能。然而現有方法大多侷限於單一風格條件類型,限制了應用範圍。此外,高品質數據集的匱乏導致風格不一致與時間閃爍問題。為解決這些侷限性,我們提出DreamStyle——一個統一的影片風格化框架,支援(1)文字引導、(2)風格圖像引導及(3)首幀引導的影片風格化,並配備精心設計的數據篩選流程以獲取高品質配對影片數據。DreamStyle基於原始圖像轉影片模型構建,採用具備詞元特定上矩陣的低秩自適應訓練技術,有效降低不同條件詞元間的混淆。定性與定量評估均表明,DreamStyle能勝任三類影片風格化任務,並在風格一致性和影片品質方面超越現有方法。
English
Video stylization, an important downstream task of video generation models, has not yet been thoroughly explored. Its input style conditions typically include text, style image, and stylized first frame. Each condition has a characteristic advantage: text is more flexible, style image provides a more accurate visual anchor, and stylized first frame makes long-video stylization feasible. However, existing methods are largely confined to a single type of style condition, which limits their scope of application. Additionally, their lack of high-quality datasets leads to style inconsistency and temporal flicker. To address these limitations, we introduce DreamStyle, a unified framework for video stylization, supporting (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization, accompanied by a well-designed data curation pipeline to acquire high-quality paired video data. DreamStyle is built on a vanilla Image-to-Video (I2V) model and trained using a Low-Rank Adaptation (LoRA) with token-specific up matrices that reduces the confusion among different condition tokens. Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks, and outperforms the competitors in style consistency and video quality.