物体の山積み操作のための動的解像度モデル学習

要旨

視覚観察から学習したダイナミクスモデルは、様々なロボット操作タスクにおいて有効であることが示されています。このようなダイナミクスモデルを学習する上で重要な課題の一つは、どのようなシーン表現を使用するかです。従来の研究では、固定次元または固定解像度の表現を前提とすることが一般的でしたが、これは単純なタスクでは非効率的であり、より複雑なタスクでは効果的でない場合があります。本研究では、効率性と有効性の最適なトレードオフを実現するために、異なる抽象化レベルで動的かつ適応的な表現を学習する方法を探ります。具体的には、環境の動的解像度粒子表現を構築し、抽象化レベルを連続的に選択可能なグラフニューラルネットワーク（GNN）を使用して統一されたダイナミクスモデルを学習します。テスト時には、エージェントが各モデル予測制御（MPC）ステップで最適な解像度を適応的に決定できます。本手法を、調理、農業、製造、医薬品アプリケーションなどでよく遭遇する物体の山操作タスクで評価します。シミュレーションと実世界の両方での包括的な評価を通じて、コーヒー豆、アーモンド、トウモロコシなどの様々なインスタンスで作られた粒状物体の山の収集、分類、再分配において、本手法が最先端の固定解像度ベースラインよりも大幅に優れた性能を達成することを示します。

English

Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.

物体の山積み操作のための動的解像度モデル学習

Dynamic-Resolution Model Learning for Object Pile Manipulation

要旨

Support