ChatPaper.aiChatPaper

深度强制:基于深度汇聚与参与式压缩的无训练长视频生成技术

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

December 4, 2025
作者: Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim
cs.AI

摘要

近期自回归视频扩散模型虽已实现实时帧流式生成,但现有方案仍存在时序重复、漂移和运动减速问题。我们发现直接应用StreamingLLM风格的注意力池机制会导致视频生成质量下降与运动停滞。为此,我们提出深度强制(Deep Forcing)方法,通过两种免训练机制在不进行微调的情况下解决上述问题。具体包括:1)深度池化(Deep Sink)将滑动窗口的一半容量专用于持久性池化令牌,并通过重对齐其时间RoPE相位至当前时间轴,确保长序列生成过程中全局上下文的稳定性;2)参与式压缩(Participative Compression)实施基于重要性的KV缓存修剪,仅保留近期注意力机制中活跃参与的令牌,安全剔除冗余及劣化的历史数据,从而在分布外生成长度下最小化误差累积。两项技术协同工作可实现超过12倍的序列外推能力(如从训练时长的5秒扩展至60秒以上生成),在成像质量上超越LongLive,在美学质量上优于RollingForcing,几乎保持整体一致性,并显著提升动态表现度,同时维持实时生成效率。实验结果表明,在自回归流式长视频生成任务中,免训练的KV缓存管理方法可达到甚至超越基于训练的方案效果。
English
Recent advances in autoregressive video diffusion have enabled real-time frame streaming, yet existing solutions still suffer from temporal repetition, drift, and motion deceleration. We find that naively applying StreamingLLM-style attention sinks to video diffusion leads to fidelity degradation and motion stagnation. To overcome this, we introduce Deep Forcing, which consists of two training-free mechanisms that address this without any fine-tuning. Specifically, 1) Deep Sink dedicates half of the sliding window to persistent sink tokens and re-aligns their temporal RoPE phase to the current timeline, stabilizing global context during long rollouts. 2) Participative Compression performs importance-aware KV cache pruning that preserves only tokens actively participating in recent attention while safely discarding redundant and degraded history, minimizing error accumulation under out-of-distribution length generation. Together, these components enable over 12x extrapolation (e.g. 5s-trained to 60s+ generation) with better imaging quality than LongLive, better aesthetic quality than RollingForcing, almost maintaining overall consistency, and substantial gains in dynamic degree, all while maintaining real-time generation. Our results demonstrate that training-free KV-cache management can match or exceed training-based approaches for autoregressively streaming long-video generation.
PDF21December 6, 2025