HANDAL: ポーズ注釈、アフォーダンス、再構成を備えた実世界の操作可能な物体カテゴリのデータセット

要旨

カテゴリレベルの物体姿勢推定とアフォーダンス予測のためのHANDALデータセットを提案します。従来のデータセットとは異なり、本データセットはロボットマニピュレータによる機能的な把持に適したサイズと形状を持つ、ロボティクス対応の操作可能な物体（ペンチ、調理器具、ドライバーなど）に焦点を当てています。アノテーションプロセスは合理化されており、市販の単一カメラと半自動処理のみを必要とし、クラウドソーシングなしで高品質な3Dアノテーションを生成できます。このデータセットは、17カテゴリーの212個の実世界の物体から撮影された2.2kのビデオから得られた308kのアノテーション付き画像フレームで構成されています。ハードウェアやキッチンツールの物体に焦点を当てることで、ロボットマニピュレータが単純な押し動作や無差別な把持を超えて環境と相互作用する必要がある実用的なシナリオの研究を促進します。6自由度のカテゴリレベル姿勢+スケール推定および関連タスクにおける本データセットの有用性を概説します。また、すべての物体の3D再構築メッシュを提供し、このようなデータセットの収集を一般化するために解決すべきボトルネックの一部を概説します。

English

We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi-automated processing, allowing us to produce high-quality 3D annotations without crowd-sourcing. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We focus on hardware and kitchen tool objects to facilitate research in practical scenarios in which a robot manipulator needs to interact with the environment beyond simple pushing or indiscriminate grasping. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks. We also provide 3D reconstructed meshes of all objects, and we outline some of the bottlenecks to be addressed for democratizing the collection of datasets like this one.

HANDAL: ポーズ注釈、アフォーダンス、再構成を備えた実世界の操作可能な物体カテゴリのデータセット

HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions

要旨

Support