FlashVideo:對高效率高解析度視頻生成的細節流動保真。
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
February 7, 2025
作者: Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo
cs.AI
摘要
DiT擴散模型在文本到視頻生成方面取得了巨大成功,利用其在模型容量和數據規模方面的可擴展性。然而,與文本提示對齊的高內容和運動保真度通常需要大量的模型參數和大量的函數評估(NFEs)。逼真且視覺上吸引人的細節通常反映在高分辨率輸出中,進一步增加了計算需求,尤其是對於單階段的DiT模型。為應對這些挑戰,我們提出了一種新穎的兩階段框架,名為FlashVideo,該框架在各個階段之間策略性地分配模型容量和NFEs,以平衡生成保真度和質量。在第一階段中,通過低分辨率生成過程優先考慮提示的保真度,利用大量參數和足夠的NFEs來提高計算效率。第二階段建立了低分辨率和高分辨率之間的流匹配,有效地生成精細細節,並最小化NFEs。定量和視覺結果表明,FlashVideo實現了具有卓越計算效率的最先進高分辨率視頻生成。此外,這種兩階段設計使用戶可以在承諾進行全分辨率生成之前預覽初始輸出,從而顯著降低計算成本和等待時間,並增強商業可行性。
English
DiT diffusion models have achieved great success in text-to-video generation,
leveraging their scalability in model capacity and data scale. High content and
motion fidelity aligned with text prompts, however, often require large model
parameters and a substantial number of function evaluations (NFEs). Realistic
and visually appealing details are typically reflected in high resolution
outputs, further amplifying computational demands especially for single stage
DiT models. To address these challenges, we propose a novel two stage
framework, FlashVideo, which strategically allocates model capacity and NFEs
across stages to balance generation fidelity and quality. In the first stage,
prompt fidelity is prioritized through a low resolution generation process
utilizing large parameters and sufficient NFEs to enhance computational
efficiency. The second stage establishes flow matching between low and high
resolutions, effectively generating fine details with minimal NFEs.
Quantitative and visual results demonstrate that FlashVideo achieves
state-of-the-art high resolution video generation with superior computational
efficiency. Additionally, the two-stage design enables users to preview the
initial output before committing to full resolution generation, thereby
significantly reducing computational costs and wait times as well as enhancing
commercial viability .Summary
AI-Generated Summary