PoseLess:基于视觉语言模型的直接图像映射实现无深度视觉到关节控制
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM
March 10, 2025
作者: Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy
cs.AI
摘要
本文介绍了一种名为PoseLess的新型机器人手部控制框架,该框架通过利用投影表示直接将二维图像映射到关节角度,从而消除了对显式姿态估计的需求。我们的方法采用随机关节配置生成的合成训练数据,实现了对真实场景的零样本泛化以及从机器人手到人手的跨形态迁移。通过投影视觉输入并采用基于Transformer的解码器,PoseLess在解决深度模糊性和数据稀缺等挑战的同时,实现了鲁棒且低延迟的控制。实验结果表明,在不依赖任何人工标注数据集的情况下,该框架在关节角度预测精度上展现出具有竞争力的性能。
English
This paper introduces PoseLess, a novel framework for robot hand control that
eliminates the need for explicit pose estimation by directly mapping 2D images
to joint angles using projected representations. Our approach leverages
synthetic training data generated through randomized joint configurations,
enabling zero-shot generalization to real-world scenarios and cross-morphology
transfer from robotic to human hands. By projecting visual inputs and employing
a transformer-based decoder, PoseLess achieves robust, low-latency control
while addressing challenges such as depth ambiguity and data scarcity.
Experimental results demonstrate competitive performance in joint angle
prediction accuracy without relying on any human-labelled dataset.Summary
AI-Generated Summary