R2RGEN: 空間的に一般化された操作のための実世界間3Dデータ生成

要旨

汎用的なロボットマニピュレーションを目指す上で、空間的汎化は最も基本的な能力であり、異なる物体、環境、エージェント自体の空間分布においてもロバストに動作するポリシーを必要とします。これを実現するためには、模倣学習を通じて汎用的な視覚運動ポリシーを訓練するために、異なる空間構成をカバーする大量の人間によるデモンストレーションを収集する必要があります。先行研究では、最小限のソースデモンストレーションから空間的に多様なデータを取得するためにデータ生成を活用する有望な方向性を探求しています。しかし、ほとんどのアプローチはシミュレーションと現実の間の大きなギャップに直面し、固定ベースのシナリオや事前定義されたカメラ視点などの制約された設定に限定されることが多いです。本論文では、現実世界のデータを直接生成するために、点群観測-行動ペアを拡張する現実対現実の3Dデータ生成フレームワーク（R2RGen）を提案します。R2RGenはシミュレータやレンダリングを必要としないため、効率的でプラグアンドプレイです。具体的には、単一のソースデモンストレーションを基に、シーンと軌跡の細かい解析のためのアノテーションメカニズムを導入します。複雑な多物体構成や多様なタスク制約を扱うために、グループ単位の拡張戦略を提案します。さらに、生成されたデータの分布を現実世界の3Dセンサーと整合させるためのカメラ対応処理を提示します。実験的に、R2RGenは広範な実験においてデータ効率を大幅に向上させ、モバイルマニピュレーションにおけるスケーリングと応用の強い可能性を示しています。

English

Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.

R2RGEN: 空間的に一般化された操作のための実世界間3Dデータ生成

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

要旨

Support