ThinkGrasp：一种用于混乱环境中战略零部件抓取的视觉-语言系统

摘要

在充满杂乱环境中的机器人抓取仍然是一个重大挑战，这是由于遮挡和复杂的物体布局所致。我们开发了ThinkGrasp，这是一个即插即用的视觉-语言抓取系统，利用GPT-4o的先进语境推理来制定重度混乱环境抓取策略。ThinkGrasp能够有效识别并生成目标物体的抓取姿势，即使它们被严重遮挡或几乎看不见，也能通过目标导向语言来引导清除遮挡物体。这种方法逐步揭示目标物体，最终通过少量步骤和高成功率抓取它。在模拟和真实实验中，ThinkGrasp取得了高成功率，并在充分混乱的环境或具有多样未知物体的情况下明显优于最先进的方法，展现出强大的泛化能力。

English

Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even when they are heavily obstructed or nearly invisible, by using goal-oriented language to guide the removal of obstructing objects. This approach progressively uncovers the target object and ultimately grasps it with a few steps and a high success rate. In both simulated and real experiments, ThinkGrasp achieved a high success rate and significantly outperformed state-of-the-art methods in heavily cluttered environments or with diverse unseen objects, demonstrating strong generalization capabilities.

ThinkGrasp：一种用于混乱环境中战略零部件抓取的视觉-语言系统

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

摘要

Support