ChatPaper.aiChatPaper

EgoPush:面向移动机器人的端到端自我中心多目标重排学习

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

February 20, 2026
作者: Boyuan An, Zhexiong Wang, Yipeng Wang, Jiaqi Li, Sihang Li, Jing Zhang, Chen Feng
cs.AI

摘要

人类能够在杂乱环境中通过以自我为中心的感知重新排列物体,在无需全局坐标系的情况下应对遮挡问题。受此能力启发,我们研究了基于单目视觉的移动机器人长时序多物体非抓取式重排任务。我们提出EgoPush框架,该策略学习框架仅需以自我为中心的视觉感知即可实现物体重排,无需依赖在动态场景中容易失效的显式全局状态估计。EgoPush设计了物体中心的潜空间来编码物体间的相对空间关系(而非绝对位姿),使拥有特权信息的强化学习教师模型能够从稀疏关键点中联合学习潜状态与移动动作,随后将其蒸馏为纯视觉学生策略。为缩小全知教师与局部观测学生之间的监督差距,我们将教师的观测限制在视觉可获取的线索范围内,从而诱导出可从学生视角恢复的主动感知行为。针对长时序任务中的信用分配问题,我们采用时序衰减的阶段式完成奖励机制,将重排任务分解为阶段级子问题。大量仿真实验表明,EgoPush在成功率上显著优于端到端强化学习基线,消融实验验证了各设计模块的有效性。我们进一步在真实移动平台上实现了零样本仿真到现实的迁移。代码与视频详见https://ai4ce.github.io/EgoPush/。
English
Humans can rearrange objects in cluttered environments using egocentric perception, navigating occlusions without global coordinates. Inspired by this capability, we study long-horizon multi-object non-prehensile rearrangement for mobile robots using a single egocentric camera. We introduce EgoPush, a policy learning framework that enables egocentric, perception-driven rearrangement without relying on explicit global state estimation that often fails in dynamic scenes. EgoPush designs an object-centric latent space to encode relative spatial relations among objects, rather than absolute poses. This design enables a privileged reinforcement-learning (RL) teacher to jointly learn latent states and mobile actions from sparse keypoints, which is then distilled into a purely visual student policy. To reduce the supervision gap between the omniscient teacher and the partially observed student, we restrict the teacher's observations to visually accessible cues. This induces active perception behaviors that are recoverable from the student's viewpoint. To address long-horizon credit assignment, we decompose rearrangement into stage-level subproblems using temporally decayed, stage-local completion rewards. Extensive simulation experiments demonstrate that EgoPush significantly outperforms end-to-end RL baselines in success rate, with ablation studies validating each design choice. We further demonstrate zero-shot sim-to-real transfer on a mobile platform in the real world. Code and videos are available at https://ai4ce.github.io/EgoPush/.
PDF51February 24, 2026