ChatPaper.aiChatPaper

基於自由形式語言的機器人推理與抓取

Free-form language-based robotic reasoning and grasping

March 17, 2025
作者: Runyu Jiao, Alice Fasoli, Francesco Giuliari, Matteo Bortolon, Sergio Povoli, Guofeng Mei, Yiming Wang, Fabio Poiesi
cs.AI

摘要

基於人類指令在雜亂的箱子中進行機器人抓取是一項具有挑戰性的任務,因為它需要同時理解自由形式語言的細微差別以及物體之間的空間關係。訓練於網絡規模數據的視覺-語言模型(VLMs),如GPT-4o,已經展示了在文本和圖像上的卓越推理能力。但它們是否真的能在零樣本設置下用於此任務?它們的局限性又是什麼?在本論文中,我們通過基於自由形式語言的機器人抓取任務探索這些研究問題,並提出了一種新方法——FreeGrasp,利用預訓練VLMs的世界知識來推理人類指令和物體空間排列。我們的方法將所有物體檢測為關鍵點,並使用這些關鍵點在圖像上進行標註,旨在促進GPT-4o的零樣本空間推理。這使得我們的方法能夠判斷請求的物體是否可以直接抓取,或者是否需要先抓取並移除其他物體。由於現有數據集並未專門為此任務設計,我們通過擴展MetaGraspNetV2數據集並加入人類註釋的指令和真實抓取序列,引入了一個合成數據集FreeGraspData。我們使用FreeGraspData進行了廣泛的分析,並通過配備夾具的機械臂進行了現實世界的驗證,展示了在抓取推理和執行方面的最先進性能。項目網站:https://tev-fbk.github.io/FreeGrasp/。
English
Performing robotic grasping from a cluttered bin based on human instructions is a challenging task, as it requires understanding both the nuances of free-form language and the spatial relationships between objects. Vision-Language Models (VLMs) trained on web-scale data, such as GPT-4o, have demonstrated remarkable reasoning capabilities across both text and images. But can they truly be used for this task in a zero-shot setting? And what are their limitations? In this paper, we explore these research questions via the free-form language-based robotic grasping task, and propose a novel method, FreeGrasp, leveraging the pre-trained VLMs' world knowledge to reason about human instructions and object spatial arrangements. Our method detects all objects as keypoints and uses these keypoints to annotate marks on images, aiming to facilitate GPT-4o's zero-shot spatial reasoning. This allows our method to determine whether a requested object is directly graspable or if other objects must be grasped and removed first. Since no existing dataset is specifically designed for this task, we introduce a synthetic dataset FreeGraspData by extending the MetaGraspNetV2 dataset with human-annotated instructions and ground-truth grasping sequences. We conduct extensive analyses with both FreeGraspData and real-world validation with a gripper-equipped robotic arm, demonstrating state-of-the-art performance in grasp reasoning and execution. Project website: https://tev-fbk.github.io/FreeGrasp/.

Summary

AI-Generated Summary

PDF103March 18, 2025