ChatPaper.aiChatPaper

MemFlow:流动自适应内存实现一致高效的长视频叙事

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

December 16, 2025
作者: Sihui Ji, Xi Chen, Shuai Yang, Xin Tao, Pengfei Wan, Hengshuang Zhao
cs.AI

摘要

流式视频生成的核心挑战在于维持长上下文中的内容一致性,这对内存设计提出了极高要求。现有方案大多通过预定义策略压缩历史帧来维护内存,但不同待生成的视频片段需参考不同的历史线索,固定策略难以满足这一需求。本研究提出MemFlow以解决该问题:在生成新片段前,我们通过检索与该片段文本提示最相关的历史帧来动态更新记忆库。该设计即使后续帧出现新事件或场景切换,也能保持叙事连贯性。此外在生成过程中,我们仅激活记忆库中与注意力层各查询最相关的标记,有效保障生成效率。MemFlow由此以可忽略的计算开销(相比无记忆基准仅降低7.9%速度)实现卓越的长上下文一致性,并保持与所有带KV缓存的流式视频生成模型的兼容性。
English
The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Most existing solutions maintain the memory by compressing historical frames with predefined strategies. However, different to-generate video chunks should refer to different historical cues, which is hard to satisfy with fixed strategies. In this work, we propose MemFlow to address this problem. Specifically, before generating the coming chunk, we dynamically update the memory bank by retrieving the most relevant historical frames with the text prompt of this chunk. This design enables narrative coherence even if new event happens or scenario switches in future frames. In addition, during generation, we only activate the most relevant tokens in the memory bank for each query in the attention layers, which effectively guarantees the generation efficiency. In this way, MemFlow achieves outstanding long-context consistency with negligible computation burden (7.9% speed reduction compared with the memory-free baseline) and keeps the compatibility with any streaming video generation model with KV cache.
PDF201December 18, 2025