ChatPaper.aiChatPaper

FIFO-Diffusion:从文本生成无限视频而无需训练

FIFO-Diffusion: Generating Infinite Videos from Text without Training

May 19, 2024
作者: Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han
cs.AI

摘要

我们提出了一种基于预训练扩散模型的新型推理技术,用于文本条件视频生成。我们的方法名为FIFO-Diffusion,概念上能够生成无限长的视频而无需训练。这是通过迭代执行对角去噪来实现的,该方法同时处理一个队列中噪声水平逐渐增加的一系列连续帧;我们的方法在头部出队一个完全去噪的帧,同时在尾部入队一个新的随机噪声帧。然而,对角去噪是一把双刃剑,因为靠近尾部的帧可以通过向前引用利用更干净的帧,但这种策略会导致训练和推理之间的差异。因此,我们引入了潜在分区来减少训练和推理之间的差距,并引入了前瞻去噪来利用向前引用的好处。我们已经展示了所提方法在现有文本到视频生成基线上的有希望的结果和有效性。
English
We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines.

Summary

AI-Generated Summary

PDF588December 15, 2024