人類通用抓取
Human Universal Grasping
June 15, 2026
作者: Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto
cs.AI
摘要
人類能夠毫不費力地抓取物體,而多指機器人遠未達到這種通用程度。我們認為,機器人抓取數據最自然的來源是人類——他們每天會拿起數千個物體。為此,我們提出 HUG,一種流匹配模型,能根據立體相機拍攝的單張 RGB-D 影像,針對使用者指定的任何物體生成多樣化的人類抓取方式。我們首先利用智慧眼鏡收集了 1M-HUGs,這是一個以自我為中心的數據集,涵蓋 100 萬幀(27.8 小時)的抓取資料,涉及 41 棟建築物中的 6,707 個物體實例。接著,為了建模自然人類抓取的分佈,我們新穎的流匹配模型融合 RGB 與深度觀測,輸出由手腕平移、手腕旋轉及 MANO 手部姿態參數化的抓取方式。預測的抓取可重新對應至各種機器人手部,實現日常場景中的零樣本抓取。為了標準化評估,我們建立了一個全新的模擬基準 HUG-Bench,包含 90 個來自五種幾何類別、多種尺寸的未見過物體,並附有公制尺度的 3D 網格。我們在多種立體相機、機器人形態及家庭環境中,針對 HUG-Bench 的 30 個物體測試集進行實際世界評估。HUG 在我們具挑戰性的物體集合上,分別領先現有最先進的抓取基準 23% 與 34%。程式碼、數據、基準、模型檢查點及互動式示範已發布於我們的網站:https://grasping.io/
English
Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/