ChatPaper.aiChatPaper

Helios:真正即時長影片生成模型

Helios: Real Real-Time Long Video Generation Model

March 4, 2026
作者: Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, Li Yuan
cs.AI

摘要

我們推出Helios——首個140億參數的視頻生成模型,在單張NVIDIA H100 GPU上可實現19.5 FPS的實時生成,支持分鐘級長視頻生成且質量媲美強基準模型。我們在三個關鍵維度實現突破:(1) 無需自強制、誤差累積庫或關鍵幀採樣等常用防漂移策略,即可實現長視頻生成的穩健性;(2) 無需KV緩存、稀疏/線性注意力或量化等標準加速技術,即可達成實時生成;(3) 無需並行或分片框架即可完成訓練,在80GB GPU內存中容納最多四個140億參數模型,同時實現圖像擴散模型級別的批次大小。具體而言,Helios是採用統一輸入表徵的140億參數自回歸擴散模型,原生支持文本到視頻、圖像到視頻及視頻到視頻任務。為緩解長視頻生成中的漂移問題,我們系統性歸納典型失效模式,提出在訓練中顯式模擬漂移現象的簡潔有效策略,並從源頭消除重複性運動。在效率方面,通過大幅壓縮歷史上下文與噪聲上下文,並減少採樣步數,使計算成本與13億參數視頻生成模型相當甚至更低。此外,我們引入基礎設施層級優化技術,在降低內存消耗的同時加速推理與訓練。大量實驗表明,Helios在短視頻與長視頻生成任務上均持續超越現有方法。我們計劃開原始碼、基礎模型與蒸餾模型,以支持社區進一步發展。
English
We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generation without standard acceleration techniques such as KV-cache, sparse/linear attention, or quantization; and (3) training without parallelism or sharding frameworks, enabling image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks. To mitigate drifting in long-video generation, we characterize typical failure modes and propose simple yet effective training strategies that explicitly simulate drifting during training, while eliminating repetitive motion at its source. For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models. Moreover, we introduce infrastructure-level optimizations that accelerate both inference and training while reducing memory consumption. Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation. We plan to release the code, base model, and distilled model to support further development by the community.
PDF1235March 6, 2026