Memory-V2V:基于记忆增强的视频到视频扩散模型
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
January 22, 2026
作者: Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen, Jong Chul Ye, Duygu Ceylan, Hyeonho Jeong
cs.AI
摘要
近期基于基础架构的视频到视频扩散模型在编辑用户提供视频方面取得了显著成果,能够实现外观、运动或摄像机运动的修改。然而,现实中的视频编辑通常是一个迭代过程,用户需要通过多轮交互来优化结果。在这种多轮编辑场景下,现有视频编辑器难以保持序列编辑间的跨时序一致性。本研究首次针对多轮视频编辑中的跨一致性问题提出解决方案,引入Memory-V2V——一个通过显式记忆机制增强现有视频到视频模型的简洁而有效的框架。该框架通过建立已编辑视频的外部缓存库,采用精准检索与动态标记化策略,使当前编辑步骤能够基于先前结果进行条件生成。为减少冗余计算开销,我们进一步在DiT主干网络中设计了可学习的标记压缩器,在保留关键视觉线索的同时压缩冗余条件标记,实现整体30%的加速效果。我们在视频新视角合成和文本条件长视频编辑等挑战性任务上验证了Memory-V2V的性能。大量实验表明,该方法在保持甚至超越现有最优基线模型任务性能的同时,能以最小计算开销生成具有显著跨时序一致性的视频。项目页面:https://dohunlee1.github.io/MemoryV2V
English
Recent foundational video-to-video diffusion models have achieved impressive results in editing user provided videos by modifying appearance, motion, or camera movement. However, real-world video editing is often an iterative process, where users refine results across multiple rounds of interaction. In this multi-turn setting, current video editors struggle to maintain cross-consistency across sequential edits. In this work, we tackle, for the first time, the problem of cross-consistency in multi-turn video editing and introduce Memory-V2V, a simple, yet effective framework that augments existing video-to-video models with explicit memory. Given an external cache of previously edited videos, Memory-V2V employs accurate retrieval and dynamic tokenization strategies to condition the current editing step on prior results. To further mitigate redundancy and computational overhead, we propose a learnable token compressor within the DiT backbone that compresses redundant conditioning tokens while preserving essential visual cues, achieving an overall speedup of 30%. We validate Memory-V2V on challenging tasks including video novel view synthesis and text-conditioned long video editing. Extensive experiments show that Memory-V2V produces videos that are significantly more cross-consistent with minimal computational overhead, while maintaining or even improving task-specific performance over state-of-the-art baselines. Project page: https://dohunlee1.github.io/MemoryV2V