ChatPaper.aiChatPaper

Stream-DiffVSR:基於自回歸擴散模型的低延遲可串流影片超解析度技術

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

December 29, 2025
作者: Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu
cs.AI

摘要

基於擴散模型的視訊超解析度方法雖能實現優異的感知質量,但因其依賴未來幀與昂貴的多步去噪過程,在延遲敏感場景中仍不實用。我們提出Stream-DiffVSR——一種因果條件化的擴散框架,專為高效線上視訊超解析度設計。該方法嚴格基於過往幀進行處理,整合了四大核心組件:用於快速推理的四步蒸餾去噪器、在潛在去噪過程中注入運動對齊線索的自回歸時序引導模組,以及配備時序處理模組的輕量級時序感知解碼器,可同步增強細節還原與時序連貫性。Stream-DiffVSR在RTX4090 GPU上處理720p幀僅需0.328秒,顯著超越現有擴散基方法。相較線上SOTA方法TMP,它在提升感知質量(LPIPS +0.095)的同時將延遲降低逾130倍。本方法實現了擴散基VSR中最低的延遲紀錄,將初始延遲從超過4600秒縮減至0.328秒,成為首個適用於低延遲線上部署的擴散VSR方案。項目頁面:https://jamichss.github.io/stream-diffvsr-project-page/
English
Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/
PDF291December 31, 2025