GPT4Motion:通过面向Blender的GPT规划在文本到视频生成中编写物理动作
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
November 21, 2023
作者: Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
cs.AI
摘要
最近在文本到视频生成领域取得的进展已经利用扩散模型的能力,创作出在文本提示条件下引人入胜的视觉内容。然而,它们通常面临高计算成本,并经常难以生成具有连贯物理运动的视频。为了解决这些问题,我们提出了GPT4Motion,这是一个无需训练的框架,利用了大型语言模型(如GPT)的规划能力、Blender的物理模拟强度,以及文本到图像扩散模型的出色图像生成能力,以增强视频合成的质量。具体而言,GPT4Motion利用GPT-4根据用户的文本提示生成Blender脚本,该脚本指挥Blender内置的物理引擎制作包含跨帧连贯物理运动的基本场景组件。然后,这些组件被输入到稳定扩散模型中,生成与文本提示相一致的视频。在三种基本物理运动场景(包括刚性物体下落和碰撞、布料垂挂和摆动以及液体流动)上的实验结果表明,GPT4Motion能够高效生成高质量视频,保持运动连贯性和实体一致性。GPT4Motion为文本到视频研究提供了新的见解,提升了其质量,并拓宽了未来探索的视野。
English
Recent advances in text-to-video generation have harnessed the power of
diffusion models to create visually compelling content conditioned on text
prompts. However, they usually encounter high computational costs and often
struggle to produce videos with coherent physical motions. To tackle these
issues, we propose GPT4Motion, a training-free framework that leverages the
planning capability of large language models such as GPT, the physical
simulation strength of Blender, and the excellent image generation ability of
text-to-image diffusion models to enhance the quality of video synthesis.
Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a
user textual prompt, which commands Blender's built-in physics engine to craft
fundamental scene components that encapsulate coherent physical motions across
frames. Then these components are inputted into Stable Diffusion to generate a
video aligned with the textual prompt. Experimental results on three basic
physical motion scenarios, including rigid object drop and collision, cloth
draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate
high-quality videos efficiently in maintaining motion coherency and entity
consistency. GPT4Motion offers new insights in text-to-video research,
enhancing its quality and broadening its horizon for future explorations.