逐次的な器用さ：長期的な操作のための器用なポリシーの連鎖

要旨

現実世界の多くの操作タスクは、互いに大きく異なる一連のサブタスクで構成されています。このような長期的で複雑なタスクは、適応性と汎用性を備えた器用なハンドの可能性を浮き彫りにします。器用なハンドは、再把持や外部ツールを必要とせずに、異なる機能モード間をシームレスに移行することができます。しかし、器用なハンドの高次元アクション空間と長期的タスクの複雑な構成力学により、課題が生じます。本論文では、長期的タスク目標を達成するために複数の器用なポリシーを連鎖させる、強化学習（RL）に基づく汎用システム「Sequential Dexterity」を提案します。このシステムの中核となるのは、連鎖成功率を向上させるためにサブポリシーを段階的に微調整する遷移実現可能性関数であり、失敗からの回復や冗長な段階の回避のための自律的なポリシー切り替えも可能にします。シミュレーション内で少数のタスクオブジェクトのみで訓練されたにもかかわらず、本システムは新しい物体形状への一般化能力を示し、器用なハンドを備えた実世界のロボットへのゼロショット転移が可能です。詳細と動画結果はhttps://sequential-dexterity.github.ioでご覧いただけます。

English

Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight the potential of dexterous hands, which possess adaptability and versatility, capable of seamlessly transitioning between different modes of functionality without the need for re-grasping or external tools. However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks. We present Sequential Dexterity, a general system based on reinforcement learning (RL) that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate, while also enables autonomous policy-switching for recovery from failures and bypassing redundant stages. Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand. More details and video results could be found at https://sequential-dexterity.github.io

逐次的な器用さ：長期的な操作のための器用なポリシーの連鎖

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

要旨

Support