PoseLess:基於視覺語言模型直接圖像映射的無深度視覺到關節控制
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM
March 10, 2025
作者: Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy
cs.AI
摘要
本文介紹了PoseLess,這是一種新穎的機器人手控制框架,它通過使用投影表示直接將2D圖像映射到關節角度,從而消除了對顯式姿態估計的需求。我們的方法利用通過隨機化關節配置生成的合成訓練數據,實現了對現實場景的零樣本泛化以及從機器人手到人手的跨形態轉移。通過投影視覺輸入並採用基於transformer的解碼器,PoseLess在解決深度模糊性和數據稀缺性等挑戰的同時,實現了穩健、低延遲的控制。實驗結果表明,在不依賴任何人工標註數據集的情況下,PoseLess在關節角度預測準確性方面表現出競爭力。
English
This paper introduces PoseLess, a novel framework for robot hand control that
eliminates the need for explicit pose estimation by directly mapping 2D images
to joint angles using projected representations. Our approach leverages
synthetic training data generated through randomized joint configurations,
enabling zero-shot generalization to real-world scenarios and cross-morphology
transfer from robotic to human hands. By projecting visual inputs and employing
a transformer-based decoder, PoseLess achieves robust, low-latency control
while addressing challenges such as depth ambiguity and data scarcity.
Experimental results demonstrate competitive performance in joint angle
prediction accuracy without relying on any human-labelled dataset.Summary
AI-Generated Summary