从文本到动作:在人形机器人“Alter3”中实现GPT-4的基础
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
December 11, 2023
作者: Takahide Yoshida, Atsushi Masumori, Takashi Ikegami
cs.AI
摘要
我们报告了Alter3的开发,这是一款能够利用大型语言模型(LLM),特别是GPT-4,生成自发动作的人形机器人。通过将GPT-4集成到我们的专有安卓机器人Alter3中,我们实现了这一成就,从而有效地将LLM与Alter的身体运动结合起来。通常,低级机器人控制是依赖硬件的,并且超出了LLM语料库的范围,这给基于LLM的直接机器人控制带来了挑战。然而,在像Alter3这样的人形机器人的情况下,通过将人类行为的语言表达映射到机器人的身体上,直接控制是可行的,这是通过程序代码实现的。值得注意的是,这种方法使Alter3能够采取各种姿势,比如“自拍”姿势或“假装成鬼”,并且可以随时间生成动作序列,而无需为每个身体部位进行明确编程。这展示了机器人的零-shot学习能力。此外,口头反馈可以调整姿势,无需进行微调。Alter3生成动作的视频可在https://tnoinkwms.github.io/ALTER-LLM/ 上观看。
English
We report the development of Alter3, a humanoid robot capable of generating
spontaneous motion using a Large Language Model (LLM), specifically GPT-4. This
achievement was realized by integrating GPT-4 into our proprietary android,
Alter3, thereby effectively grounding the LLM with Alter's bodily movement.
Typically, low-level robot control is hardware-dependent and falls outside the
scope of LLM corpora, presenting challenges for direct LLM-based robot control.
However, in the case of humanoid robots like Alter3, direct control is feasible
by mapping the linguistic expressions of human actions onto the robot's body
through program code. Remarkably, this approach enables Alter3 to adopt various
poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate
sequences of actions over time without explicit programming for each body part.
This demonstrates the robot's zero-shot learning capabilities. Additionally,
verbal feedback can adjust poses, obviating the need for fine-tuning. A video
of Alter3's generated motions is available at
https://tnoinkwms.github.io/ALTER-LLM/