ChatPaper.aiChatPaper

FadeMem:面向自回归视频扩散的距离感知记忆整合

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

June 9, 2026
作者: Yu Lu, Junjie Yang, Piotr Koniusz, YuXin Song, Yi Yang
cs.AI

摘要

自回归视频生成器通过生成连续的时间片段来合成长视频,但其历史键值缓存会随视频长度增长而膨胀。现有有界缓存方法通过局部窗口、汇聚令牌或压缩记忆状态来降低这一开销,但通常为历史不同部分分配固定角色。我们提出FadeMem——一种距离感知的键值记忆体整合机制,在固定缓存预算下将历史键值块组织成时间层级结构。该设计源于频率依赖的时间衰减规律:细节特征快速解相关,而粗粒度场景结构与主体特征在更长时域内保持有效。生成过程中,新历史以细粒度条目插入,而邻近旧条目在幂律时间分配调度下逐步合并,形成缓存内部的"近密远疏"记忆。无需修改架构,FadeMem即可为短期动态保留近期上下文,同时为身份与场景连贯性保留紧凑的远程锚点。实验表明,与现有有界缓存策略相比,该方法在主体一致性、背景稳定性及时间连贯性方面均有提升。
English
Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.