ChatPaper.aiChatPaper

Stream-DiffVSR:基于自回归扩散模型的可流式低延迟视频超分辨率

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

December 29, 2025
作者: Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu
cs.AI

摘要

基于扩散模型的视频超分辨率方法虽能实现出色的感知质量,但由于依赖未来帧和昂贵的多步去噪过程,在延迟敏感场景中仍不实用。我们提出Stream-DiffVSR——一种基于因果条件扩散框架的高效在线视频超分辨率方案。该方法严格基于历史帧运行,融合了四大核心技术:采用四步蒸馏去噪器实现快速推理;通过自回归时序引导模块在潜在去噪过程中注入运动对齐线索;配备轻量级时序感知解码器与时序处理模块以增强细节和时序连贯性。Stream-DiffVSR在RTX4090 GPU上处理720p帧仅需0.328秒,显著超越现有扩散模型方法。与在线SOTA方法TMP相比,在提升感知质量的同时将延迟降低130倍以上。该方案实现了扩散模型视频超分辨率领域的最低延迟记录,将初始延迟从4600秒以上缩减至0.328秒,成为首款适用于低延迟在线部署的扩散超分辨率方法。项目页面:https://jamichss.github.io/stream-diffvsr-project-page/
English
Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/
PDF291December 31, 2025