汎用ロボットポリシーにおけるショートカット学習：データセットの多様性と断片化の役割

要旨

Open X-Embodiment（OXE）などの大規模データセットでトレーニングされた汎用ロボットポリシーは、幅広いタスクにおいて高い性能を発揮します。しかし、これらのポリシーは、トレーニングデータの分布を超えた一般化に苦戦することがしばしばあります。本論文では、この限られた一般化能力の根本的な原因を調査します。我々は、タスクと無関係な特徴に依存する「ショートカット学習」が一般化の主要な障害であることを特定しました。包括的な理論的および実証的分析を通じて、ショートカット学習の2つの主要な要因を明らかにしました：(1) 個々のサブデータセット内の多様性の不足、および (2) サブデータセット間の顕著な分布の差異によるデータセットの断片化です。これらの問題は、OXEのような大規模データセットの固有の構造から生じます。これらのデータセットは、通常、異なる環境やエンボディメントで独立して収集された複数のサブデータセットで構成されています。我々の知見は、ショートカット学習を減らし、汎用ロボットポリシーの一般化能力を向上させるためのデータセット収集戦略に重要な洞察を提供します。さらに、新たな大規模データの取得が現実的でないシナリオにおいても、慎重に選択されたロボットデータ拡張戦略が、既存のオフラインデータセットにおけるショートカット学習を効果的に減らし、シミュレーションおよび実世界環境における汎用ロボットポリシー（例：pi_0）の一般化能力を向上させることができることを実証します。詳細は https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/ をご覧ください。

English

Generalist robot policies trained on large-scale datasets such as Open X-Embodiment (OXE) demonstrate strong performance across a wide range of tasks. However, they often struggle to generalize beyond the distribution of their training data. In this paper, we investigate the underlying cause of this limited generalization capability. We identify shortcut learning -- the reliance on task-irrelevant features -- as a key impediment to generalization. Through comprehensive theoretical and empirical analysis, we uncover two primary contributors to shortcut learning: (1) limited diversity within individual sub-datasets, and (2) significant distributional disparities across sub-datasets, leading to dataset fragmentation. These issues arise from the inherent structure of large-scale datasets like OXE, which are typically composed of multiple sub-datasets collected independently across varied environments and embodiments. Our findings provide critical insights into dataset collection strategies that can reduce shortcut learning and enhance the generalization ability of generalist robot policies. Moreover, in scenarios where acquiring new large-scale data is impractical, we demonstrate that carefully selected robotic data augmentation strategies can effectively reduce shortcut learning in existing offline datasets, thereby improving generalization capabilities of generalist robot policies, e.g., pi_0, in both simulation and real-world environments. More information at https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/.

汎用ロボットポリシーにおけるショートカット学習：データセットの多様性と断片化の役割

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

要旨

Support