PoseLess: VLMを用いた直接画像マッピングによる深度情報不要の視覚-関節制御

要旨

本論文では、PoseLessという新しいロボットハンド制御フレームワークを紹介する。このフレームワークは、明示的な姿勢推定を必要とせず、投影表現を用いて2D画像を直接関節角度にマッピングする。我々のアプローチは、ランダム化された関節構成によって生成された合成トレーニングデータを活用し、実世界のシナリオへのゼロショット一般化と、ロボットハンドから人間の手へのクロスモルフォロジー転移を可能にする。視覚入力を投影し、トランスフォーマーベースのデコーダを採用することで、PoseLessは深度の曖昧さやデータ不足といった課題に対処しつつ、ロバストで低遅延の制御を実現する。実験結果は、人間によるラベル付けデータセットに依存することなく、関節角度予測精度において競争力のある性能を示している。

English

This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By projecting visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.

PoseLess: VLMを用いた直接画像マッピングによる深度情報不要の視覚-関節制御

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

要旨

Support