物体堆叠操作的动态分辨率模型学习

摘要

从视觉观察中学习的动力学模型已被证明在各种机器人操纵任务中非常有效。学习这些动力学模型的一个关键问题是使用何种场景表示。先前的研究通常假设采用固定维度或分辨率的表示，这对简单任务可能效率低，对更复杂的任务则效果不佳。在这项工作中，我们研究如何学习不同抽象级别的动态和自适应表示，以实现效率和有效性之间的最佳权衡。具体而言，我们构建了环境的动态分辨率粒子表示，并使用图神经网络（GNNs）学习统一的动力学模型，该模型允许连续选择抽象级别。在测试阶段，代理可以自适应地确定每个模型预测控制（MPC）步骤的最佳分辨率。我们在物体堆叠操纵中评估了我们的方法，这是我们在烹饪、农业、制造和制药应用中经常遇到的任务。通过在模拟和现实世界中进行全面评估，我们展示了我们的方法在收集、排序和重新分配由各种实例制成的颗粒状物体堆（如咖啡豆、杏仁、玉米等）方面比最先进的固定分辨率基线表现显著更好。

English

Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.

物体堆叠操作的动态分辨率模型学习

Dynamic-Resolution Model Learning for Object Pile Manipulation

摘要

Support