大規模言語モデルを用いた生成的な表現力豊かなロボット行動

要旨

人々は、他者との効果的なコミュニケーションや行動の調整のために表現的な行動を用います。例えば、視線を合わせた相手に対してうなずくことや、混雑した廊下で「すみません」と言って人々を通り抜けることなどです。私たちは、人間とロボットのインタラクションにおいても、ロボットが表現的な行動を示すことを望んでいます。これまでの研究では、新しいコミュニケーション手法や社会的状況に拡張するのが難しいルールベースの手法が提案されてきましたが、データ駆動型の手法では、ロボットが使用される各社会的状況に対して専門的なデータセットが必要です。私たちは、大規模言語モデル（LLMs）が提供する豊富な社会的文脈と、指示やユーザーの好みに基づいて動作を生成する能力を活用し、適応性と構成性を持つ表現的なロボット動作を生成することを提案します。私たちのアプローチでは、数ショットの連鎖思考プロンプトを使用して、人間の言語指示をロボットの利用可能な学習済みスキルを用いたパラメータ化された制御コードに変換します。ユーザー調査とシミュレーション実験を通じて、私たちのアプローチが、ユーザーにとって有能で理解しやすいと感じられる行動を生成することを示します。補足資料はhttps://generative-expressive-motion.github.io/でご覧いただけます。

English

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.

大規模言語モデルを用いた生成的な表現力豊かなロボット行動

Generative Expressive Robot Behaviors using Large Language Models

要旨

Support