ChatPaper.aiChatPaper

PyramidalWan:构建金字塔式预训练视频模型以实现高效推理

PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

January 8, 2026
作者: Denis Korzhenkov, Adil Karjauv, Animesh Karnewar, Mohsen Ghafoorian, Amirhossein Habibian
cs.AI

摘要

近期提出的金字塔模型将传统的正向与反向扩散过程分解为多尺度处理阶段。这些模型在低分辨率下处理高噪声输入,而在高分辨率下处理低噪声输入。这种分层方法显著降低了多步去噪模型的推理计算成本。然而,现有开源金字塔视频模型均需从头训练,且在视觉合理性方面往往逊于顶尖系统。本研究提出一种通过低成本微调将预训练扩散模型转化为金字塔模型的流程,在保持生成视频质量无损的同时实现模型转换。此外,我们探索并比较了金字塔模型内部的多步骤蒸馏策略,以进一步提升推理效率。研究成果详见https://qualcomm-ai-research.github.io/PyramidalWan。
English
Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with higher noise levels at lower resolutions, while less noisy inputs are processed at higher resolutions. This hierarchical approach significantly reduces the computational cost of inference in multi-step denoising models. However, existing open-source pyramidal video models have been trained from scratch and tend to underperform compared to state-of-the-art systems in terms of visual plausibility. In this work, we present a pipeline that converts a pretrained diffusion model into a pyramidal one through low-cost finetuning, achieving this transformation without degradation in quality of output videos. Furthermore, we investigate and compare various strategies for step distillation within pyramidal models, aiming to further enhance the inference efficiency. Our results are available at https://qualcomm-ai-research.github.io/PyramidalWan.
PDF11January 10, 2026