MemFlow：基于自适应记忆流的连贯高效长视频叙事框架

摘要

流式视频生成的核心挑战在于维持长上下文中的内容一致性，这对内存设计提出了极高要求。现有方案大多通过预定义策略压缩历史帧来维护内存，但不同待生成视频片段需参考不同的历史线索，固定策略难以满足这一需求。本文提出MemFlow以解决该问题：在生成新片段前，我们根据该片段的文本提示动态检索最相关的历史帧来更新内存库。这种设计能确保即使后续帧出现新事件或场景切换，叙事连贯性仍得以保持。此外在生成过程中，我们仅激活内存库中与注意力层各查询最相关的标记，有效保障生成效率。MemFlow由此以可忽略的计算开销（相较无内存基线的生成速度仅降低7.9%）实现卓越的长上下文一致性，并保持与所有带KV缓存的流式视频生成模型的兼容性。

English

The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Most existing solutions maintain the memory by compressing historical frames with predefined strategies. However, different to-generate video chunks should refer to different historical cues, which is hard to satisfy with fixed strategies. In this work, we propose MemFlow to address this problem. Specifically, before generating the coming chunk, we dynamically update the memory bank by retrieving the most relevant historical frames with the text prompt of this chunk. This design enables narrative coherence even if new event happens or scenario switches in future frames. In addition, during generation, we only activate the most relevant tokens in the memory bank for each query in the attention layers, which effectively guarantees the generation efficiency. In this way, MemFlow achieves outstanding long-context consistency with negligible computation burden (7.9% speed reduction compared with the memory-free baseline) and keeps the compatibility with any streaming video generation model with KV cache.

MemFlow：基于自适应记忆流的连贯高效长视频叙事框架

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

摘要

Support