使用ChatGPT優化描述生成細緻的人類動作

摘要

最近，在基於文本的動作生成方面取得了顯著進展，使得能夠生成符合文本描述的多樣且高質量的人類動作。然而，由於缺乏帶有詳細文本描述的數據集，生成細粒度或風格化動作仍然具有挑戰性。通過採用分而治之策略，我們提出了一個名為Fine-Grained Human Motion Diffusion Model（FG-MDM）的新框架用於人類動作生成。具體而言，我們首先通過利用大型語言模型（GPT-3.5）將先前模糊的文本標註解析為不同身體部位的細粒度描述。然後，我們使用這些細粒度描述來引導基於Transformer的擴散模型。FG-MDM能夠生成細粒度和風格化的動作，即使是在訓練數據的分佈之外。我們的實驗結果顯示了FG-MDM相對於先前方法的優越性，特別是強大的泛化能力。我們將釋出我們對HumanML3D和KIT進行的細粒度文本標註。

English

Recently, significant progress has been made in text-based motion generation, enabling the generation of diverse and high-quality human motions that conform to textual descriptions. However, it remains challenging to generate fine-grained or stylized motions due to the lack of datasets annotated with detailed textual descriptions. By adopting a divide-and-conquer strategy, we propose a new framework named Fine-Grained Human Motion Diffusion Model (FG-MDM) for human motion generation. Specifically, we first parse previous vague textual annotation into fine-grained description of different body parts by leveraging a large language model (GPT-3.5). We then use these fine-grained descriptions to guide a transformer-based diffusion model. FG-MDM can generate fine-grained and stylized motions even outside of the distribution of the training data. Our experimental results demonstrate the superiority of FG-MDM over previous methods, especially the strong generalization capability. We will release our fine-grained textual annotations for HumanML3D and KIT.

使用ChatGPT優化描述生成細緻的人類動作

Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions

摘要

Support