物體堆疊操作的動態解析度模型學習

摘要

從視覺觀察中學習的動態模型已被證明在各種機器人操作任務中非常有效。學習這些動態模型的一個關鍵問題是使用什麼場景表示。先前的研究通常假設在固定維度或解析度上進行表示，這對於簡單任務可能效率低下，對於更複雜的任務則可能無效。在這項工作中，我們探討如何學習不同抽象層次的動態和自適應表示，以實現效率和有效性之間的最佳折衷。具體而言，我們構建了環境的動態解析度粒子表示，並使用圖神經網絡（GNNs）學習統一的動態模型，該模型允許連續選擇抽象層次。在測試時，代理可以在每個模型預測控制（MPC）步驟中自適應地確定最佳解析度。我們在物體堆疊操作中評估了我們的方法，這是我們在烹飪、農業、製造和製藥應用中常遇到的任務。通過在模擬和現實世界中進行全面評估，我們展示了我們的方法在收集、分類和重新分配由咖啡豆、杏仁、玉米等各種實例製成的顆粒狀物體堆疊方面比最先進的固定解析度基準表現顯著更好。

English

Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.

物體堆疊操作的動態解析度模型學習

Dynamic-Resolution Model Learning for Object Pile Manipulation

摘要

Support