通用型機器人策略中的捷徑學習：數據集多樣性與碎片化的角色

摘要

在大型數據集（如Open X-Embodiment, OXE）上訓練的通用機器人策略展現了在廣泛任務中的強大性能。然而，這些策略往往難以超越其訓練數據分佈進行泛化。本文探討了這種有限泛化能力的根本原因，並將捷徑學習——依賴於任務無關特徵——識別為阻礙泛化的關鍵因素。通過全面的理論與實證分析，我們揭示了捷徑學習的兩大主要成因：(1) 個別子數據集內的多樣性有限，以及(2) 子數據集間顯著的分佈差異，導致數據集碎片化。這些問題源於如OXE等大型數據集的固有結構，這些數據集通常由在多樣環境和實體中獨立收集的多個子數據集組成。我們的研究成果為減少捷徑學習、提升通用機器人策略泛化能力的數據集收集策略提供了關鍵見解。此外，在獲取新的大規模數據不切實際的情況下，我們展示了精心選擇的機器人數據增強策略能有效減少現有離線數據集中的捷徑學習，從而提升通用機器人策略（例如pi_0）在模擬與現實環境中的泛化能力。更多資訊請訪問https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/。

English

Generalist robot policies trained on large-scale datasets such as Open X-Embodiment (OXE) demonstrate strong performance across a wide range of tasks. However, they often struggle to generalize beyond the distribution of their training data. In this paper, we investigate the underlying cause of this limited generalization capability. We identify shortcut learning -- the reliance on task-irrelevant features -- as a key impediment to generalization. Through comprehensive theoretical and empirical analysis, we uncover two primary contributors to shortcut learning: (1) limited diversity within individual sub-datasets, and (2) significant distributional disparities across sub-datasets, leading to dataset fragmentation. These issues arise from the inherent structure of large-scale datasets like OXE, which are typically composed of multiple sub-datasets collected independently across varied environments and embodiments. Our findings provide critical insights into dataset collection strategies that can reduce shortcut learning and enhance the generalization ability of generalist robot policies. Moreover, in scenarios where acquiring new large-scale data is impractical, we demonstrate that carefully selected robotic data augmentation strategies can effectively reduce shortcut learning in existing offline datasets, thereby improving generalization capabilities of generalist robot policies, e.g., pi_0, in both simulation and real-world environments. More information at https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/.

通用型機器人策略中的捷徑學習：數據集多樣性與碎片化的角色

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

摘要

Support