物件中心表徵提升機器人操作中的策略泛化能力
Object-Centric Representations Improve Policy Generalization in Robot Manipulation
May 16, 2025
作者: Alexandre Chapin, Bruno Machado, Emmanuel Dellandrea, Liming Chen
cs.AI
摘要
視覺表徵對於機器人操作策略的學習與泛化能力至關重要。儘管現有方法依賴於全域或密集特徵,但此類表徵往往將任務相關與無關的場景信息混雜在一起,限制了在分佈變化下的魯棒性。在本研究中,我們探討了以物體為中心的表徵(OCR)作為一種結構化替代方案,它將視覺輸入分割為一組完整的實體,引入了更自然地與操作任務相契合的歸納偏置。我們在一系列從簡單到複雜的模擬及現實世界操作任務中,對比了多種視覺編碼器——包括以物體為中心、全域及密集方法——並評估了它們在不同視覺條件下的泛化能力,這些條件涵蓋了光照、紋理變化以及干擾物的存在。我們的研究結果表明,在泛化場景中,基於OCR的策略即使無需任務特定的預訓練,也能超越密集和全域表徵。這些發現提示,OCR是設計能在動態現實世界機器人環境中有效泛化的視覺系統的一個有前景的方向。
English
Visual representations are central to the learning and generalization
capabilities of robotic manipulation policies. While existing methods rely on
global or dense features, such representations often entangle task-relevant and
irrelevant scene information, limiting robustness under distribution shifts. In
this work, we investigate object-centric representations (OCR) as a structured
alternative that segments visual input into a finished set of entities,
introducing inductive biases that align more naturally with manipulation tasks.
We benchmark a range of visual encoders-object-centric, global and dense
methods-across a suite of simulated and real-world manipulation tasks ranging
from simple to complex, and evaluate their generalization under diverse visual
conditions including changes in lighting, texture, and the presence of
distractors. Our findings reveal that OCR-based policies outperform dense and
global representations in generalization settings, even without task-specific
pretraining. These insights suggest that OCR is a promising direction for
designing visual systems that generalize effectively in dynamic, real-world
robotic environments.Summary
AI-Generated Summary