ChatPaper.aiChatPaper

Memory-V2V:基於記憶增強技術的影片到影片擴散模型

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

January 22, 2026
作者: Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen, Jong Chul Ye, Duygu Ceylan, Hyeonho Jeong
cs.AI

摘要

近期基礎影片到影片擴散模型在編輯使用者提供的影片方面取得了令人矚目的成果,能夠修改外觀、動作或攝影機運動。然而,真實世界的影片編輯往往是個迭代過程,使用者需要透過多輪互動來精修結果。在這種多輪編輯情境下,現有影片編輯工具難以維持連續編輯間的跨幀一致性。本研究首次針對多輪影片編輯中的跨一致性問題提出解決方案,引入Memory-V2V——一個簡單而有效的框架,通過顯式記憶機制增強現有影片到影片模型。該框架利用外部快取儲存先前編輯過的影片,採用精確檢索與動態標記化策略,使當前編輯步驟能基於過往結果進行條件生成。為進一步減少冗餘與計算開銷,我們在DiT骨幹網路中設計可學習的標記壓縮器,能在保留關鍵視覺線索的同時壓縮冗餘條件標記,實現整體30%的加速效果。我們在影片新視角合成和文字條件長影片編輯等挑戰性任務上驗證Memory-V2V。大量實驗表明,該方法能以最小計算開銷生成顯著提升跨一致性的影片,同時在任務特定性能上保持甚至超越現有頂尖基準方法。專案頁面:https://dohunlee1.github.io/MemoryV2V
English
Recent foundational video-to-video diffusion models have achieved impressive results in editing user provided videos by modifying appearance, motion, or camera movement. However, real-world video editing is often an iterative process, where users refine results across multiple rounds of interaction. In this multi-turn setting, current video editors struggle to maintain cross-consistency across sequential edits. In this work, we tackle, for the first time, the problem of cross-consistency in multi-turn video editing and introduce Memory-V2V, a simple, yet effective framework that augments existing video-to-video models with explicit memory. Given an external cache of previously edited videos, Memory-V2V employs accurate retrieval and dynamic tokenization strategies to condition the current editing step on prior results. To further mitigate redundancy and computational overhead, we propose a learnable token compressor within the DiT backbone that compresses redundant conditioning tokens while preserving essential visual cues, achieving an overall speedup of 30%. We validate Memory-V2V on challenging tasks including video novel view synthesis and text-conditioned long video editing. Extensive experiments show that Memory-V2V produces videos that are significantly more cross-consistent with minimal computational overhead, while maintaining or even improving task-specific performance over state-of-the-art baselines. Project page: https://dohunlee1.github.io/MemoryV2V
PDF181January 27, 2026