범용 로봇 정책에서의 단축 학습: 데이터셋 다양성과 분할의 역할

초록

Open X-Embodiment(OXE)와 같은 대규모 데이터셋으로 훈련된 범용 로봇 정책은 다양한 작업에서 강력한 성능을 보여줍니다. 그러나 이러한 정책들은 훈련 데이터의 분포를 벗어난 상황에서는 일반화 능력이 제한되는 경우가 많습니다. 본 논문에서는 이러한 제한된 일반화 능력의 근본적인 원인을 조사합니다. 우리는 작업과 무관한 특성에 의존하는 단축 학습(shortcut learning)이 일반화의 주요 장애물임을 확인했습니다. 포괄적인 이론적 및 실증적 분석을 통해, 우리는 단축 학습의 두 가지 주요 원인을 밝혀냈습니다: (1) 개별 하위 데이터셋 내의 제한된 다양성, 그리고 (2) 하위 데이터셋 간의 상당한 분포 차이로 인한 데이터셋 단편화. 이러한 문제는 OXE와 같은 대규모 데이터셋의 고유한 구조에서 비롯됩니다. 이러한 데이터셋은 일반적으로 다양한 환경과 구현체에서 독립적으로 수집된 여러 하위 데이터셋으로 구성됩니다. 우리의 연구 결과는 단축 학습을 줄이고 범용 로봇 정책의 일반화 능력을 향상시킬 수 있는 데이터셋 수집 전략에 대한 중요한 통찰을 제공합니다. 또한, 새로운 대규모 데이터를 획득하기 어려운 상황에서, 신중하게 선택된 로봇 데이터 증강 전략이 기존 오프라인 데이터셋에서 단축 학습을 효과적으로 줄이고, 시뮬레이션 및 실제 환경에서 범용 로봇 정책(예: pi_0)의 일반화 능력을 개선할 수 있음을 입증했습니다. 더 많은 정보는 https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/에서 확인할 수 있습니다.

English

Generalist robot policies trained on large-scale datasets such as Open X-Embodiment (OXE) demonstrate strong performance across a wide range of tasks. However, they often struggle to generalize beyond the distribution of their training data. In this paper, we investigate the underlying cause of this limited generalization capability. We identify shortcut learning -- the reliance on task-irrelevant features -- as a key impediment to generalization. Through comprehensive theoretical and empirical analysis, we uncover two primary contributors to shortcut learning: (1) limited diversity within individual sub-datasets, and (2) significant distributional disparities across sub-datasets, leading to dataset fragmentation. These issues arise from the inherent structure of large-scale datasets like OXE, which are typically composed of multiple sub-datasets collected independently across varied environments and embodiments. Our findings provide critical insights into dataset collection strategies that can reduce shortcut learning and enhance the generalization ability of generalist robot policies. Moreover, in scenarios where acquiring new large-scale data is impractical, we demonstrate that carefully selected robotic data augmentation strategies can effectively reduce shortcut learning in existing offline datasets, thereby improving generalization capabilities of generalist robot policies, e.g., pi_0, in both simulation and real-world environments. More information at https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/.

범용 로봇 정책에서의 단축 학습: 데이터셋 다양성과 분할의 역할

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

초록

Support