ChatPaper.aiChatPaper

模擬中的操控:實現機器人精確的幾何感知

Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

September 2, 2025
作者: Minghuan Liu, Zhengbang Zhu, Xiaoshen Han, Peng Hu, Haotong Lin, Xinyao Li, Jingxiao Chen, Jiafeng Xu, Yichu Yang, Yunfeng Lin, Xinghang Li, Yong Yu, Weinan Zhang, Tao Kong, Bingyi Kang
cs.AI

摘要

現代機器人操作主要依賴於二維彩色空間的視覺觀察來進行技能學習,但這種方法在泛化能力上表現不佳。相比之下,生活在三維世界中的人類在與物體互動時,更依賴於物理屬性——如距離、大小和形狀——而非紋理。由於這些三維幾何信息可以從廣泛可用的深度相機中獲取,賦予機器人相似的感知能力似乎可行。我們的初步研究發現,使用深度相機進行操作具有挑戰性,主要是由於其精度有限且易受各種噪聲影響。在本研究中,我們提出了相機深度模型(CDMs)作為日常使用深度相機的簡單插件,該模型以RGB圖像和原始深度信號為輸入,輸出經過去噪的、精確的度量深度。為實現這一目標,我們開發了一個神經數據引擎,通過模擬深度相機的噪聲模式,從仿真中生成高質量的配對數據。我們的結果表明,CDMs在深度預測上達到了近乎仿真級別的精度,有效地彌合了仿真與現實之間的操作任務差距。值得注意的是,我們的實驗首次證明,在未添加噪聲或進行現實世界微調的情況下,基於原始仿真深度訓練的策略能夠無縫泛化到現實世界的機器人上,在涉及關節、反光和細長物體的兩個具有挑戰性的長時程任務中,性能幾乎沒有下降。我們希望我們的發現能激發未來研究在一般機器人策略中利用仿真數據和三維信息的靈感。
English
Modern robotic manipulation primarily relies on visual observations in a 2D color space for skill learning but suffers from poor generalization. In contrast, humans, living in a 3D world, depend more on physical properties-such as distance, size, and shape-than on texture when interacting with objects. Since such 3D geometric information can be acquired from widely available depth cameras, it appears feasible to endow robots with similar perceptual capabilities. Our pilot study found that using depth cameras for manipulation is challenging, primarily due to their limited accuracy and susceptibility to various types of noise. In this work, we propose Camera Depth Models (CDMs) as a simple plugin on daily-use depth cameras, which take RGB images and raw depth signals as input and output denoised, accurate metric depth. To achieve this, we develop a neural data engine that generates high-quality paired data from simulation by modeling a depth camera's noise pattern. Our results show that CDMs achieve nearly simulation-level accuracy in depth prediction, effectively bridging the sim-to-real gap for manipulation tasks. Notably, our experiments demonstrate, for the first time, that a policy trained on raw simulated depth, without the need for adding noise or real-world fine-tuning, generalizes seamlessly to real-world robots on two challenging long-horizon tasks involving articulated, reflective, and slender objects, with little to no performance degradation. We hope our findings will inspire future research in utilizing simulation data and 3D information in general robot policies.
PDF41September 4, 2025