Mobius：基於潛在空間轉移的文本到無縫循環影片生成

摘要

我們提出了Mobius，這是一種新穎的方法，能夠直接從文本描述生成無縫循環視頻，無需任何用戶標註，從而為多媒體演示創造新的視覺素材。我們的方法重新利用了預訓練的視頻潛在擴散模型，從文本提示生成循環視頻，且無需任何訓練。在推理過程中，我們首先通過連接視頻的起始和結束噪聲來構建一個潛在循環。考慮到視頻擴散模型的上下文可以保持時間一致性，我們通過在每一步逐漸將第一幀的潛在特徵移至末尾來進行多幀潛在去噪。結果，去噪的上下文在每一步中變化，同時在整個推理過程中保持一致性。此外，我們方法中的潛在循環可以是任意長度。這將我們的潛在移位方法擴展到生成無縫循環視頻，超越了視頻擴散模型上下文的範圍。與以往的動態圖像不同，所提出的方法不需要圖像作為外觀，這會限制生成結果的運動。相反，我們的方法能夠產生更具動態性的運動和更好的視覺質量。我們進行了多項實驗和比較，以驗證所提出方法的有效性，展示了其在不同場景中的效能。所有代碼將公開提供。

English

We present Mobius, a novel method to generate seamlessly looping videos from text descriptions directly without any user annotations, thereby creating new visual materials for the multi-media presentation. Our method repurposes the pre-trained video latent diffusion model for generating looping videos from text prompts without any training. During inference, we first construct a latent cycle by connecting the starting and ending noise of the videos. Given that the temporal consistency can be maintained by the context of the video diffusion model, we perform multi-frame latent denoising by gradually shifting the first-frame latent to the end in each step. As a result, the denoising context varies in each step while maintaining consistency throughout the inference process. Moreover, the latent cycle in our method can be of any length. This extends our latent-shifting approach to generate seamless looping videos beyond the scope of the video diffusion model's context. Unlike previous cinemagraphs, the proposed method does not require an image as appearance, which will restrict the motions of the generated results. Instead, our method can produce more dynamic motion and better visual quality. We conduct multiple experiments and comparisons to verify the effectiveness of the proposed method, demonstrating its efficacy in different scenarios. All the code will be made available.

Mobius：基於潛在空間轉移的文本到無縫循環影片生成

Mobius: Text to Seamless Looping Video Generation via Latent Shift

摘要

Support