ChatPaper.aiChatPaper

ThinkGrasp:一個用於在混亂環境中進行戰略部件抓取的視覺語言系統

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

July 16, 2024
作者: Yaoyao Qian, Xupeng Zhu, Ondrej Biza, Shuo Jiang, Linfeng Zhao, Haojie Huang, Yu Qi, Robert Platt
cs.AI

摘要

在充滿雜亂環境中的機器人抓取仍然是一個重大挑戰,這是由於遮擋和複雜的物體排列所導致的。我們開發了ThinkGrasp,這是一個即插即用的視覺語言抓取系統,利用GPT-4o的先進上下文推理技術來制定應對重度雜亂環境的抓取策略。ThinkGrasp能夠有效識別並生成目標物體的抓取姿勢,即使它們被嚴重遮擋或幾乎看不見,也能透過目標導向語言來引導清除遮擋物體。這種方法逐步揭示目標物體,最終以少數步驟和高成功率抓取它。在模擬和實際實驗中,ThinkGrasp實現了高成功率,並在重度雜亂環境或具有多樣未知物體的情況下顯著優於最先進的方法,展現出強大的泛化能力。
English
Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even when they are heavily obstructed or nearly invisible, by using goal-oriented language to guide the removal of obstructing objects. This approach progressively uncovers the target object and ultimately grasps it with a few steps and a high success rate. In both simulated and real experiments, ThinkGrasp achieved a high success rate and significantly outperformed state-of-the-art methods in heavily cluttered environments or with diverse unseen objects, demonstrating strong generalization capabilities.

Summary

AI-Generated Summary

PDF52November 28, 2024