ChatPaper.aiChatPaper

**FusionFrames:文字轉影片生成流程的高效架構設計**

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

November 22, 2023
作者: Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov
cs.AI

摘要

多媒體生成方法在人工智慧研究中佔據重要地位。文字轉圖像模型在過去幾年間已實現高品質成果,然而影片合成方法直至近期才開始蓬勃發展。本文提出一種基於文字轉圖像擴散模型的新型兩階段潛在擴散文字轉影片生成架構。第一階段專注於關鍵影格合成以構建影片敘事框架,第二階段則致力於插補影格生成以使場景與物體運動流暢自然。我們針對關鍵影格生成比較了多種時間條件設定方法,結果顯示在反映影片生成品質指標與人類偏好方面,採用獨立時間區塊的設計優於時間層級結構。與其他遮罩影格插值方法相比,我們的插補模型設計顯著降低了計算成本。此外,我們評估了基於MoVQ的影片解碼方案的不同配置,以提升連貫性並獲得更高的PSNR、SSIM、MSE和LPIPS評分。最終,我們將本流程與現有解決方案進行比較,在整體評比中取得第二名的成績,並在開源方案中位列第一:CLIPSIM=0.2976,FVD=433.054。專案頁面:https://ai-forever.github.io/kandinsky-video/
English
Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth. We compare several temporal conditioning approaches for keyframes generation. The results show the advantage of using separate temporal blocks over temporal layers in terms of metrics reflecting video generation quality aspects and human preference. The design of our interpolation model significantly reduces computational costs compared to other masked frame interpolation approaches. Furthermore, we evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores. Finally, we compare our pipeline with existing solutions and achieve top-2 scores overall and top-1 among open-source solutions: CLIPSIM = 0.2976 and FVD = 433.054. Project page: https://ai-forever.github.io/kandinsky-video/
PDF584February 8, 2026