SkillBlender: スキルブレンディングによる汎用ヒューマノイド全身移動操作の実現に向けて

要旨

ヒューマノイドロボットは、その柔軟性と人間に似た形態により、多様な環境下での日常タスクの遂行において大きな可能性を秘めている。近年の研究では、最適制御や強化学習を活用したヒューマノイドの全身制御や移動操作において大きな進展が見られている。しかし、これらの手法は、満足のいく動作を達成するために各タスクごとに煩雑なチューニングを必要とし、日常シナリオにおける多様なタスクへの汎用性と拡張性を制限している。そこで我々は、汎用的なヒューマノイドの移動操作を実現するための新しい階層型強化学習フレームワークであるSkillBlenderを提案する。SkillBlenderはまず、目標条件付きのタスク非依存なプリミティブスキルを事前学習し、その後これらのスキルを動的にブレンドすることで、最小限のタスク固有の報酬設計で複雑な移動操作タスクを達成する。さらに、3つの実装形態、4つのプリミティブスキル、および8つの挑戦的な移動操作タスクを含む並列でクロスエンボディメントかつ多様なシミュレーションベンチマークであるSkillBenchを導入し、精度と実現可能性をバランスさせた科学的評価指標を提供する。大規模なシミュレーション実験により、我々の手法が全てのベースラインを大幅に上回り、報酬ハッキングを避けるために自然に動作を正則化し、日常シナリオにおける多様な移動操作タスクに対してより正確で実現可能な動作を実現することが示された。我々のコードとベンチマークは、今後の研究を促進するためにコミュニティに公開される予定である。プロジェクトページ: https://usc-gvl.github.io/SkillBlender-web/。

English

Humanoid robots hold significant potential in accomplishing daily tasks across diverse environments thanks to their flexibility and human-like morphology. Recent works have made significant progress in humanoid whole-body control and loco-manipulation leveraging optimal control or reinforcement learning. However, these methods require tedious task-specific tuning for each task to achieve satisfactory behaviors, limiting their versatility and scalability to diverse tasks in daily scenarios. To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation. SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks with minimal task-specific reward engineering. We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks, accompanied by a set of scientific evaluation metrics balancing accuracy and feasibility. Extensive simulated experiments show that our method significantly outperforms all baselines, while naturally regularizing behaviors to avoid reward hacking, resulting in more accurate and feasible movements for diverse loco-manipulation tasks in our daily scenarios. Our code and benchmark will be open-sourced to the community to facilitate future research. Project page: https://usc-gvl.github.io/SkillBlender-web/.