從文字到動作：將 GPT-4 應用於人形機器人「Alter3」

摘要

我們報告了Alter3的開發，這是一個能夠利用大型語言模型（LLM），具體來說是GPT-4，產生自發運動的人形機器人。這一成就是通過將GPT-4整合到我們的專有安卓機器人Alter3中實現的，從而有效地通過Alter的身體運動來為LLM提供基礎。通常，低層機器人控制取決於硬件並且超出LLM語料庫的範圍，這對於直接基於LLM的機器人控制提出了挑戰。然而，在像Alter3這樣的人形機器人的情況下，通過將人類行為的語言表達映射到機器人的身體上，直接控制是可行的，而無需為每個身體部位進行明確編程。顯著的是，這種方法使Alter3能夠採取各種姿勢，例如“自拍”姿勢或“假裝成鬼”的姿勢，並且隨著時間生成動作序列，而無需為每個身體部位進行明確編程。這展示了機器人的零-shot學習能力。此外，口頭反饋可以調整姿勢，從而無需進行微調。Alter3生成動作的視頻可在以下網址查看：https://tnoinkwms.github.io/ALTER-LLM/

English

We report the development of Alter3, a humanoid robot capable of generating spontaneous motion using a Large Language Model (LLM), specifically GPT-4. This achievement was realized by integrating GPT-4 into our proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement. Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code. Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's zero-shot learning capabilities. Additionally, verbal feedback can adjust poses, obviating the need for fine-tuning. A video of Alter3's generated motions is available at https://tnoinkwms.github.io/ALTER-LLM/

從文字到動作：將 GPT-4 應用於人形機器人「Alter3」

From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"

摘要

Support