ChatPaper.aiChatPaper

HiStream:基于冗余消除流式传输的高效高分辨率视频生成技术

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

December 24, 2025
作者: Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua
cs.AI

摘要

高分辨率视频生成虽对数字媒体与电影至关重要,但受限于扩散模型的二次计算复杂度,实际推理难以实现。为此,我们提出HiStream——一种高效的自回归框架,通过三轴系统性冗余削减策略:i) 空间压缩:先以低分辨率去噪,再利用缓存特征进行高分辨率细化;ii) 时序压缩:采用分块处理策略与固定大小的锚点缓存,确保稳定推理速度;iii) 步长压缩:对后续缓存条件化的分块应用更少去噪步数。在1080p基准测试中,我们的核心HiStream模型(i+ii)在实现顶尖视觉质量的同时,去噪速度较Wan2.1基线提升最高达76.2倍且画质损失可忽略。加速变体HiStream+融合三项优化(i+ii+iii),相比基线实现107.5倍加速,在速度与质量间达成理想平衡,最终使高分辨率视频生成兼具实用性与可扩展性。
English
High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To address this, we introduce HiStream, an efficient autoregressive framework that systematically reduces redundancy across three axes: i) Spatial Compression: denoising at low resolution before refining at high resolution with cached features; ii) Temporal Compression: a chunk-by-chunk strategy with a fixed-size anchor cache, ensuring stable inference speed; and iii) Timestep Compression: applying fewer denoising steps to subsequent, cache-conditioned chunks. On 1080p benchmarks, our primary HiStream model (i+ii) achieves state-of-the-art visual quality while demonstrating up to 76.2x faster denoising compared to the Wan2.1 baseline and negligible quality loss. Our faster variant, HiStream+, applies all three optimizations (i+ii+iii), achieving a 107.5x acceleration over the baseline, offering a compelling trade-off between speed and quality, thereby making high-resolution video generation both practical and scalable.
PDF141December 26, 2025