GPT4Motion:通過面向Blender的GPT規劃,在文本到視頻生成中編寫物理動作
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
November 21, 2023
作者: Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
cs.AI
摘要
最近在文本到視頻生成方面取得的進展已經利用擴散模型的能力來創建視覺上引人注目的內容,並根據文本提示進行條件設置。然而,它們通常遇到高計算成本,並且經常難以生成具有連貫物理運動的視頻。為了應對這些問題,我們提出了GPT4Motion,這是一個無需訓練的框架,利用了大型語言模型(如GPT)的規劃能力,Blender的物理模擬強度,以及文本到圖像擴散模型的出色圖像生成能力,以增強視頻合成的質量。具體來說,GPT4Motion利用GPT-4生成基於用戶文本提示的Blender腳本,該腳本指揮Blender內置的物理引擎來製作涵蓋幀間連貫物理運動的基本場景組件。然後,這些組件被輸入到穩定擴散模型中,以生成與文本提示相一致的視頻。對包括剛性物體下落和碰撞、布料垂墜和擺動以及液體流動在內的三種基本物理運動場景的實驗結果表明,GPT4Motion能夠高效生成高質量視頻,保持運動連貫性和實體一致性。GPT4Motion為文本到視頻研究提供了新的見解,提高了其質量並拓寬了未來探索的範圍。
English
Recent advances in text-to-video generation have harnessed the power of
diffusion models to create visually compelling content conditioned on text
prompts. However, they usually encounter high computational costs and often
struggle to produce videos with coherent physical motions. To tackle these
issues, we propose GPT4Motion, a training-free framework that leverages the
planning capability of large language models such as GPT, the physical
simulation strength of Blender, and the excellent image generation ability of
text-to-image diffusion models to enhance the quality of video synthesis.
Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a
user textual prompt, which commands Blender's built-in physics engine to craft
fundamental scene components that encapsulate coherent physical motions across
frames. Then these components are inputted into Stable Diffusion to generate a
video aligned with the textual prompt. Experimental results on three basic
physical motion scenarios, including rigid object drop and collision, cloth
draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate
high-quality videos efficiently in maintaining motion coherency and entity
consistency. GPT4Motion offers new insights in text-to-video research,
enhancing its quality and broadening its horizon for future explorations.