ChatPaper.aiChatPaper

聚焦任务相关特征:以物体为中心的表示提升机器人操作泛化能力

Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation

January 29, 2026
作者: Alexandre Chapin, Bruno Machado, Emmanuel Dellandréa, Liming Chen
cs.AI

摘要

机器人操作策略的泛化能力受视觉表征选择的显著影响。现有方法通常依赖预训练编码器提取的两种主流特征:全局特征通过单一池化向量概括整幅图像,稠密特征则保留最终编码器层的分块嵌入。尽管应用广泛,这两种特征均混合了任务相关与无关信息,导致在光照、纹理变化或干扰物出现等分布偏移场景下泛化能力不足。本研究探索了一种结构化折中方案:基于槽位的物体中心表征(SBOCR),该方法将稠密特征分组为有限个类物体实体。这种表征能自然减少输入机器人操作策略的噪声,同时保留足够信息以高效完成任务。我们在从简单到复杂的仿真与真实世界操作任务中,系统对比了多种全局/稠密表征与基于槽位的中间表征。通过评估不同视觉条件(包括光照/纹理变化和干扰物存在)下的泛化表现,发现基于SBOCR的策略在泛化场景中优于稠密和全局表征策略,且无需任务特定预训练。这些发现表明,SBOCR为设计能有效适应动态真实机器人环境的视觉系统提供了新方向。
English
The generalization capabilities of robotic manipulation policies are heavily influenced by the choice of visual representations. Existing approaches typically rely on representations extracted from pre-trained encoders, using two dominant types of features: global features, which summarize an entire image via a single pooled vector, and dense features, which preserve a patch-wise embedding from the final encoder layer. While widely used, both feature types mix task-relevant and irrelevant information, leading to poor generalization under distribution shifts, such as changes in lighting, textures, or the presence of distractors. In this work, we explore an intermediate structured alternative: Slot-Based Object-Centric Representations (SBOCR), which group dense features into a finite set of object-like entities. This representation permits to naturally reduce the noise provided to the robotic manipulation policy while keeping enough information to efficiently perform the task. We benchmark a range of global and dense representations against intermediate slot-based representations, across a suite of simulated and real-world manipulation tasks ranging from simple to complex. We evaluate their generalization under diverse visual conditions, including changes in lighting, texture, and the presence of distractors. Our findings reveal that SBOCR-based policies outperform dense and global representation-based policies in generalization settings, even without task-specific pretraining. These insights suggest that SBOCR is a promising direction for designing visual systems that generalize effectively in dynamic, real-world robotic environments.
PDF02January 31, 2026