人类通用抓取
Human Universal Grasping
June 15, 2026
作者: Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto
cs.AI
摘要
人类可以毫不费力地抓取物体,而多指机器人远未达到这一通用水平。我们认为,机器人抓取数据最自然的来源是人类——他们每天都会拿起成千上万个物体。为此,我们提出HUG,一种流匹配模型,能够基于立体相机拍摄的单张RGB-D图像,为用户指定的任意物体生成多样化的类人抓取姿态。首先,我们利用智能眼镜采集了1M-HUGs,这是一个以自我为中心的类人抓取数据集,涵盖100万帧(27.8小时)、41栋建筑中的6707个物体实例。接着,为建模自然类人抓取的分布,我们的新型流匹配模型融合了RGB和深度观测数据,输出由手腕平移、手腕旋转及MANO手部姿态参数化的抓取结果。预测的抓取可重定向至多种机器人手,实现日常场景中的零样本抓取。为规范化评估,我们构建了新的模拟基准HUG-Bench,包含来自五个几何类别、多种尺寸的90个未见物体,并配有公制尺度的三维网格模型。我们在HUG-Bench的30个物体测试集上,跨多款立体相机、机器人实体及家庭环境进行了真实世界评估。在我们极具挑战性的物体集上,HUG相比最先进的抓取基线方法分别提升了23%和34%。代码、数据、基准、检查点及交互式演示已发布于我们的网站:https://grasping.io/
English
Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/