ChatPaper.aiChatPaper

聚焦任务相关特征:以物体为中心的表示方法提升机器人操作泛化能力

Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation

January 29, 2026
作者: Alexandre Chapin, Bruno Machado, Emmanuel Dellandréa, Liming Chen
cs.AI

摘要

机器人操作策略的泛化能力深受视觉表示选择的影响。现有方法通常依赖预训练编码器提取的表示,主要使用两类特征:全局特征(通过单一池化向量概括整幅图像)和密集特征(保留最终编码器层的分块嵌入)。尽管应用广泛,这两种特征类型都会混合任务相关与无关信息,导致在光照、纹理变化或干扰物出现等分布偏移下泛化能力不足。本研究探索了一种中间层级的结构化替代方案:基于槽位的物体中心表示(SBOCR),该方法将密集特征分组为有限个类物体实体。这种表示能自然减少输入机器人操作策略的噪声,同时保留足够信息以高效执行任务。我们在从简单到复杂的模拟及现实世界操作任务套件中,对多种全局/密集表示与中间层级的槽位表示进行了基准测试,评估了它们在光照、纹理变化及干扰物存在等不同视觉条件下的泛化表现。研究发现,即使没有任务特定预训练,基于SBOCR的策略在泛化场景中仍优于基于密集和全局表示的策略。这些发现表明,SBOCR为设计能在动态现实机器人环境中有效泛化的视觉系统提供了有前景的研究方向。
English
The generalization capabilities of robotic manipulation policies are heavily influenced by the choice of visual representations. Existing approaches typically rely on representations extracted from pre-trained encoders, using two dominant types of features: global features, which summarize an entire image via a single pooled vector, and dense features, which preserve a patch-wise embedding from the final encoder layer. While widely used, both feature types mix task-relevant and irrelevant information, leading to poor generalization under distribution shifts, such as changes in lighting, textures, or the presence of distractors. In this work, we explore an intermediate structured alternative: Slot-Based Object-Centric Representations (SBOCR), which group dense features into a finite set of object-like entities. This representation permits to naturally reduce the noise provided to the robotic manipulation policy while keeping enough information to efficiently perform the task. We benchmark a range of global and dense representations against intermediate slot-based representations, across a suite of simulated and real-world manipulation tasks ranging from simple to complex. We evaluate their generalization under diverse visual conditions, including changes in lighting, texture, and the presence of distractors. Our findings reveal that SBOCR-based policies outperform dense and global representation-based policies in generalization settings, even without task-specific pretraining. These insights suggest that SBOCR is a promising direction for designing visual systems that generalize effectively in dynamic, real-world robotic environments.
PDF02January 31, 2026