순차적 조작 능력: 장기간 조작을 위한 정교한 정책의 연쇄적 적용

초록

실제 세계의 많은 조작 작업은 서로 상당히 다른 하위 작업들로 구성되어 있습니다. 이러한 장기적이고 복잡한 작업은 적응력과 다재다능함을 갖춘 민첩한 손의 잠재력을 강조하며, 재그립이나 외부 도구 없이도 다양한 기능 모드 간 원활한 전환이 가능합니다. 그러나 민첩한 손의 고차원적 행동 공간과 장기적 작업의 복잡한 구성적 역학으로 인해 어려움이 발생합니다. 우리는 장기적 작업 목표를 달성하기 위해 여러 민첩한 정책을 연결하는 강화 학습(RL) 기반의 일반 시스템인 Sequential Dexterity를 제시합니다. 이 시스템의 핵심은 연결 성공률을 높이기 위해 하위 정책을 점진적으로 미세 조정하는 전환 가능성 함수이며, 실패로부터의 복구와 불필요한 단계를 우회하기 위한 자율적인 정책 전환도 가능하게 합니다. 시뮬레이션 환경에서 소수의 작업 객체만으로 훈련되었음에도 불구하고, 우리의 시스템은 새로운 객체 형태에 대한 일반화 능력을 보여주며, 민첩한 손이 장착된 실제 로봇으로의 제로샷 전환도 가능합니다. 더 자세한 내용과 동영상 결과는 https://sequential-dexterity.github.io에서 확인할 수 있습니다.

English

Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight the potential of dexterous hands, which possess adaptability and versatility, capable of seamlessly transitioning between different modes of functionality without the need for re-grasping or external tools. However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks. We present Sequential Dexterity, a general system based on reinforcement learning (RL) that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate, while also enables autonomous policy-switching for recovery from failures and bypassing redundant stages. Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand. More details and video results could be found at https://sequential-dexterity.github.io

순차적 조작 능력: 장기간 조작을 위한 정교한 정책의 연쇄적 적용

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

초록

Support