R2RGEN:面向空间泛化操控的真实到真实三维数据生成
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
October 9, 2025
作者: Xiuwei Xu, Angyuan Ma, Hankun Li, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu
cs.AI
摘要
为实现通用机器人操作的目标,空间泛化是最基本的能力,要求策略在不同物体分布、环境及机器人自身位置下均能稳健工作。为此,需收集大量人类示范数据,涵盖多种空间配置,以通过模仿学习训练出通用的视觉运动策略。先前研究探索了一条有前景的路径,即利用数据生成技术,从少量源示范中获取丰富的空间多样性数据。然而,多数方法面临显著的仿真与现实差距,且常局限于固定基座场景和预设相机视角等约束条件下。本文提出了一种实对实的三维数据生成框架(R2RGen),直接通过点云观测-动作对的增强来生成真实世界数据。R2RGen无需仿真器和渲染,因此高效且即插即用。具体而言,给定单一源示范,我们引入了一种细粒度场景与轨迹解析的标注机制,并提出了一种分组增强策略,以处理复杂的多物体组合及多样任务约束。此外,我们还引入了相机感知处理,确保生成数据的分布与真实世界三维传感器对齐。实验表明,R2RGen在大量实验中显著提升了数据效率,并展现出在移动操作中扩展与应用的强大潜力。
English
Towards the aim of generalized robotic manipulation, spatial generalization
is the most fundamental capability that requires the policy to work robustly
under different spatial distribution of objects, environment and agent itself.
To achieve this, substantial human demonstrations need to be collected to cover
different spatial configurations for training a generalized visuomotor policy
via imitation learning. Prior works explore a promising direction that
leverages data generation to acquire abundant spatially diverse data from
minimal source demonstrations. However, most approaches face significant
sim-to-real gap and are often limited to constrained settings, such as
fixed-base scenarios and predefined camera viewpoints. In this paper, we
propose a real-to-real 3D data generation framework (R2RGen) that directly
augments the pointcloud observation-action pairs to generate real-world data.
R2RGen is simulator- and rendering-free, thus being efficient and
plug-and-play. Specifically, given a single source demonstration, we introduce
an annotation mechanism for fine-grained parsing of scene and trajectory. A
group-wise augmentation strategy is proposed to handle complex multi-object
compositions and diverse task constraints. We further present camera-aware
processing to align the distribution of generated data with real-world 3D
sensor. Empirically, R2RGen substantially enhances data efficiency on extensive
experiments and demonstrates strong potential for scaling and application on
mobile manipulation.