ChatPaper.aiChatPaper

SERL:一個用於樣本高效率機器人強化學習的軟體套件。

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

January 29, 2024
作者: Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine
cs.AI

摘要

近年來,在機器人強化學習(RL)領域取得了顯著進展,使得能夠處理複雜的影像觀測、在現實世界中進行訓練,並整合輔助數據,如示範和先前經驗的方法成為可能。然而,儘管取得了這些進展,機器人RL 仍然難以應用。實踐者們普遍認為,這些算法的具體實施細節通常與算法的選擇一樣重要(甚至更重要)以獲得良好的性能。我們認為機器人RL廣泛應用以及進一步發展機器人RL方法所面臨的一個重大挑戰是這些方法的相對難以接近。為應對這一挑戰,我們開發了一個精心實施的庫,其中包含一個範例高效的離線深度RL方法,以及用於計算獎勵和重置環境的方法,一個廣泛採用的機器人的高質量控制器,以及一些具有挑戰性的示例任務。我們將此庫提供給社區作為資源,描述其設計選擇並呈現實驗結果。也許令人驚訝的是,我們發現我們的實施可以實現非常高效的學習,在 PCB 板組裝、電纜路由和物體重新定位等任務上,每個策略平均訓練時間為 25 到 50 分鐘,優於文獻中報告的類似任務的最新結果。這些策略實現了完美或接近完美的成功率,即使在受干擾的情況下也具有極高的韌性,並展現出自發性的恢復和修正行為。我們希望這些令人鼓舞的結果和我們高質量的開源實現將為機器人社區提供一個工具,以促進機器人RL的進一步發展。我們的代碼、文檔和視頻可在 https://serl-robot.github.io/ 找到。
English
In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/
PDF261December 15, 2024