ロボット技能合成のための言語から報酬への変換

要旨

大規模言語モデル（LLM）は、文脈学習を通じて論理的推論からコード記述まで、多様な新たな能力を獲得する上で目覚ましい進展を示してきました。ロボティクス研究者たちも、ロボット制御の能力を向上させるためにLLMを活用する方法を探求してきました。しかし、低レベルのロボット動作はハードウェアに依存し、LLMの学習コーパスでは十分に表現されていないため、これまでのLLMをロボティクスに応用する試みは、主にLLMを意味論的プランナーとして扱うか、人間が設計した制御プリミティブを介してロボットとインターフェースするものでした。一方、報酬関数は、多様なタスクを達成するために制御ポリシーを最適化できる柔軟な表現として示されており、その意味論的豊かさからLLMによって指定するのに適しています。本研究では、この認識を活かし、LLMを利用して報酬パラメータを定義し、それを最適化することで多様なロボットタスクを達成する新しいパラダイムを提案します。LLMによって生成される中間インターフェースとして報酬を使用することで、高レベルの言語指示や修正を低レベルのロボット動作に効果的に橋渡しすることができます。同時に、これをリアルタイム最適化ツールであるMuJoCo MPCと組み合わせることで、ユーザーが即座に結果を観察し、システムにフィードバックを提供できるインタラクティブな行動作成体験を実現します。提案手法の性能を体系的に評価するために、シミュレーション環境における四足歩行ロボットと器用なマニピュレータロボットに対して合計17のタスクを設計しました。提案手法は設計されたタスクの90%を確実に達成する一方で、Code-as-policiesを用いたプリミティブスキルをインターフェースとするベースラインは50%のタスクしか達成できませんでした。さらに、実機のロボットアームにおいても本手法を検証し、非把持的なプッシュなどの複雑な操作スキルがインタラクティブシステムを通じて出現することを示しました。

English

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

ロボット技能合成のための言語から報酬への変換

Language to Rewards for Robotic Skill Synthesis

要旨

Support