使用ChatGPT優化描述生成細緻的人類動作
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions
December 5, 2023
作者: Xu Shi, Chuanchen Luo, Junran Peng, Hongwen Zhang, Yunlian Sun
cs.AI
摘要
最近,在基於文本的動作生成方面取得了顯著進展,使得能夠生成符合文本描述的多樣且高質量的人類動作。然而,由於缺乏帶有詳細文本描述的數據集,生成細粒度或風格化動作仍然具有挑戰性。通過採用分而治之策略,我們提出了一個名為Fine-Grained Human Motion Diffusion Model(FG-MDM)的新框架用於人類動作生成。具體而言,我們首先通過利用大型語言模型(GPT-3.5)將先前模糊的文本標註解析為不同身體部位的細粒度描述。然後,我們使用這些細粒度描述來引導基於Transformer的擴散模型。FG-MDM能夠生成細粒度和風格化的動作,即使是在訓練數據的分佈之外。我們的實驗結果顯示了FG-MDM相對於先前方法的優越性,特別是強大的泛化能力。我們將釋出我們對HumanML3D和KIT進行的細粒度文本標註。
English
Recently, significant progress has been made in text-based motion generation,
enabling the generation of diverse and high-quality human motions that conform
to textual descriptions. However, it remains challenging to generate
fine-grained or stylized motions due to the lack of datasets annotated with
detailed textual descriptions. By adopting a divide-and-conquer strategy, we
propose a new framework named Fine-Grained Human Motion Diffusion Model
(FG-MDM) for human motion generation. Specifically, we first parse previous
vague textual annotation into fine-grained description of different body parts
by leveraging a large language model (GPT-3.5). We then use these fine-grained
descriptions to guide a transformer-based diffusion model. FG-MDM can generate
fine-grained and stylized motions even outside of the distribution of the
training data. Our experimental results demonstrate the superiority of FG-MDM
over previous methods, especially the strong generalization capability. We will
release our fine-grained textual annotations for HumanML3D and KIT.