ChatPaper.aiChatPaper

HiStream:基於冗餘消除串流機制的高效能高解析度影片生成

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

December 24, 2025
作者: Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua
cs.AI

摘要

高解析度影片生成雖對數位媒體與電影產業至關重要,卻因擴散模型的二次方計算複雜度而形成效能瓶頸,導致實際推論難以實現。為此,我們提出HiStream——一種高效的自迴歸框架,透過三軸向系統性消除冗餘:i) 空間壓縮:先於低解析度進行去噪,再利用快取特徵進行高解析度細化;ii) 時間壓縮:採用固定錨點快取的區塊逐次處理策略,確保穩定推論速度;iii) 時步壓縮:對後續快取條件化的區塊施加更少去噪步數。在1080p基準測試中,我們的主模型HiStream(i+ii)在實現頂尖視覺品質的同時,相較Wan2.1基準線展現出最高76.2倍的去噪加速,且品質損失可忽略不計。進階版本HiStream+則整合三項優化(i+ii+iii),獲得較基準線107.5倍的加速效果,在速度與品質間達成絕佳平衡,從而使高解析度影片生成兼具實用性與擴展性。
English
High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To address this, we introduce HiStream, an efficient autoregressive framework that systematically reduces redundancy across three axes: i) Spatial Compression: denoising at low resolution before refining at high resolution with cached features; ii) Temporal Compression: a chunk-by-chunk strategy with a fixed-size anchor cache, ensuring stable inference speed; and iii) Timestep Compression: applying fewer denoising steps to subsequent, cache-conditioned chunks. On 1080p benchmarks, our primary HiStream model (i+ii) achieves state-of-the-art visual quality while demonstrating up to 76.2x faster denoising compared to the Wan2.1 baseline and negligible quality loss. Our faster variant, HiStream+, applies all three optimizations (i+ii+iii), achieving a 107.5x acceleration over the baseline, offering a compelling trade-off between speed and quality, thereby making high-resolution video generation both practical and scalable.
PDF141December 26, 2025