텍스트에서 동작으로: 휴머노이드 로봇 "Alter3"에 GPT-4를 통합하기

초록

우리는 대형 언어 모델(LLM), 특히 GPT-4를 사용하여 자발적인 동작을 생성할 수 있는 휴머노이드 로봇인 Alter3의 개발을 보고합니다. 이 성과는 GPT-4를 우리의 독자적인 안드로이드인 Alter3에 통합함으로써 달성되었으며, 이를 통해 LLM을 Alter의 신체 움직임과 효과적으로 연결할 수 있었습니다. 일반적으로 로봇의 저수준 제어는 하드웨어에 의존하며 LLM 코퍼스의 범위를 벗어나기 때문에, LLM 기반의 직접적인 로봇 제어에는 어려움이 있습니다. 그러나 Alter3와 같은 휴머노이드 로봇의 경우, 인간 동작의 언어적 표현을 프로그램 코드를 통해 로봇의 신체에 매핑함으로써 직접 제어가 가능합니다. 특히 이 접근 방식은 Alter3가 '셀카' 자세나 '유령인 척하기'와 같은 다양한 포즈를 취하고, 각 신체 부위에 대한 명시적인 프로그래밍 없이도 시간에 따른 일련의 동작을 생성할 수 있게 합니다. 이는 로봇의 제로샷 학습 능력을 보여줍니다. 또한, 언어적 피드백을 통해 포즈를 조정할 수 있어 미세 조정의 필요성을 없앨 수 있습니다. Alter3의 생성된 동작에 대한 비디오는 https://tnoinkwms.github.io/ALTER-LLM/에서 확인할 수 있습니다.

English

We report the development of Alter3, a humanoid robot capable of generating spontaneous motion using a Large Language Model (LLM), specifically GPT-4. This achievement was realized by integrating GPT-4 into our proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement. Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code. Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's zero-shot learning capabilities. Additionally, verbal feedback can adjust poses, obviating the need for fine-tuning. A video of Alter3's generated motions is available at https://tnoinkwms.github.io/ALTER-LLM/

텍스트에서 동작으로: 휴머노이드 로봇 "Alter3"에 GPT-4를 통합하기

From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"

초록

Support