SERL: 샘플 효율적인 로봇 강화 학습을 위한 소프트웨어 제품군

초록

최근 몇 년 동안 로봇 강화 학습(RL) 분야에서 상당한 진전이 이루어져, 복잡한 이미지 관측을 처리하고, 실제 세계에서 훈련하며, 시연 및 사전 경험과 같은 보조 데이터를 통합할 수 있는 방법들이 개발되었습니다. 그러나 이러한 발전에도 불구하고, 로봇 RL은 여전히 사용하기 어려운 것으로 남아 있습니다. 실무자들 사이에서는 이러한 알고리즘의 특정 구현 세부 사항이 알고리즘 선택만큼이나 성능에 중요하다는 점이 인정되고 있습니다. 우리는 로봇 RL의 광범위한 채택과 더불어 로봇 RL 방법의 추가 개발에 있어서의 주요 도전 과제가 이러한 방법들의 상대적인 접근성 부재라고 주장합니다. 이 문제를 해결하기 위해, 우리는 샘플 효율적인 오프-폴리시 딥 RL 방법과 함께 보상 계산 및 환경 재설정 방법, 널리 채택된 로봇을 위한 고품질 컨트롤러, 그리고 여러 도전적인 예제 작업을 포함하는 신중하게 구현된 라이브러리를 개발했습니다. 우리는 이 라이브러리를 커뮤니티를 위한 자원으로 제공하고, 그 설계 선택을 설명하며, 실험 결과를 제시합니다. 아마도 놀랍게도, 우리의 구현은 PCB 보드 조립, 케이블 배선, 물체 재배치와 같은 작업에 대해 정책당 평균 25~50분의 훈련 시간으로 매우 효율적인 학습을 달성할 수 있으며, 문헌에서 보고된 유사 작업에 대한 최신 기술 결과를 개선합니다. 이러한 정책은 완벽하거나 거의 완벽한 성공률, 교란 상황에서도 극도의 견고성, 그리고 발생적인 복구 및 수정 행동을 보여줍니다. 우리는 이러한 유망한 결과와 고품질의 오픈소스 구현이 로봇 공학 커뮤니티가 로봇 RL의 추가 개발을 촉진할 수 있는 도구가 되기를 바랍니다. 우리의 코드, 문서, 그리고 비디오는 https://serl-robot.github.io/에서 확인할 수 있습니다.

English

In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/

SERL: 샘플 효율적인 로봇 강화 학습을 위한 소프트웨어 제품군

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

초록

Support