以对象为中心的表示方法提升机器人操作中的策略泛化能力
Object-Centric Representations Improve Policy Generalization in Robot Manipulation
May 16, 2025
作者: Alexandre Chapin, Bruno Machado, Emmanuel Dellandrea, Liming Chen
cs.AI
摘要
视觉表征对于机器人操作策略的学习与泛化能力至关重要。现有方法多依赖全局或密集特征,此类表征往往将任务相关与无关的场景信息混为一谈,限制了在分布变化下的鲁棒性。本研究探讨了以对象为中心的表征(OCR)作为一种结构化替代方案,它将视觉输入分割为一组明确的实体,引入了更符合操作任务本质的归纳偏置。我们在一系列从简单到复杂的模拟及现实世界操作任务中,对比了多种视觉编码器——包括对象中心、全局和密集方法——并评估了它们在光照、纹理变化及存在干扰物等多样化视觉条件下的泛化表现。研究结果表明,即便无需任务特定的预训练,基于OCR的策略在泛化场景下也优于密集和全局表征。这些发现表明,OCR是设计能够在动态现实机器人环境中有效泛化的视觉系统的一个有前景的方向。
English
Visual representations are central to the learning and generalization
capabilities of robotic manipulation policies. While existing methods rely on
global or dense features, such representations often entangle task-relevant and
irrelevant scene information, limiting robustness under distribution shifts. In
this work, we investigate object-centric representations (OCR) as a structured
alternative that segments visual input into a finished set of entities,
introducing inductive biases that align more naturally with manipulation tasks.
We benchmark a range of visual encoders-object-centric, global and dense
methods-across a suite of simulated and real-world manipulation tasks ranging
from simple to complex, and evaluate their generalization under diverse visual
conditions including changes in lighting, texture, and the presence of
distractors. Our findings reveal that OCR-based policies outperform dense and
global representations in generalization settings, even without task-specific
pretraining. These insights suggest that OCR is a promising direction for
designing visual systems that generalize effectively in dynamic, real-world
robotic environments.Summary
AI-Generated Summary