ChatPaper.aiChatPaper

語言轉換為機器人技能合成的獎勵

Language to Rewards for Robotic Skill Synthesis

June 14, 2023
作者: Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia
cs.AI

摘要

大型語言模型(LLMs)已展示出在上下文學習中取得多樣新能力的令人振奮進展,從邏輯推理到編碼寫作不等。機器人學研究人員也探索使用LLMs來提升機器人控制的能力。然而,由於低級機器人動作取決於硬件且在LLM訓練語料庫中佔比較少,現有的將LLMs應用於機器人學的努力主要將LLMs視為語義規劃器,或依賴人工設計的控制基元來與機器人進行接口。另一方面,獎勵函數被證明是可以為控制策略進行優化以實現多樣任務的靈活表示,而它們的語義豐富性使其適合由LLMs指定。在這項工作中,我們介紹了一種利用這一領悟的新範式,通過利用LLMs定義可以進行優化並完成各種機器人任務的獎勵參數。通過使用LLMs生成的獎勵作為中間接口,我們可以有效地將高級語言指令或更正與低級機器人動作之間的差距。同時,結合實時優化器MuJoCo MPC,增強了一種互動行為創建體驗,用戶可以立即觀察結果並向系統提供反饋。為了系統評估我們提出的方法的性能,我們為模擬四足機器人和靈巧機械手機器人設計了總共17個任務。我們展示了我們提出的方法可可靠地應對90%的設計任務,而使用基本技能作為與代碼作為策略的接口的基線則完成了50%的任務。我們進一步在真實機器人手臂上驗證了我們的方法,複雜的操作技能,如非抓握推動,是通過我們的互動系統實現的。
English
Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.
PDF120December 15, 2024