神經場的感知:手中操作的視覺觸覺知覺
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
December 20, 2023
作者: Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam
cs.AI
摘要
為了實現人類級的靈巧度,機器人必須從多模感知中推斷空間意識,以便推理接觸互動。在手中操作新物體時,這種空間意識包括估計物體的姿態和形狀。目前手中感知的現狀主要使用視覺,並限制於跟蹤事先已知的物體。此外,在操作過程中會出現對手中物體的視覺遮擋,這阻礙了當前系統在無遮擋的任務上取得進展。我們結合視覺和觸覺感知在多指手上,以估計手中操作過程中物體的姿態和形狀。我們的方法,NeuralFeels,通過在線學習神經場來編碼物體幾何形狀,並通過優化姿態圖問題來聯合跟蹤它。我們在模擬和現實世界中研究多模感知在手中操作中,通過基於本體感知驅動的策略與不同物體進行互動。我們的實驗顯示最終重建 F-分數為 81%,平均姿態漂移為 4.7 毫米,進一步減少到 2.3 毫米,當已知 CAD 模型時。此外,我們觀察到在嚴重的視覺遮擋下,與僅使用視覺方法相比,我們可以實現高達 94% 的跟蹤改進。我們的結果表明,觸覺至少可以優化視覺估計,在最好的情況下,可以在手中操作過程中消除歧義。我們釋出了我們的評估數據集 FeelSight,其中包含 70 個實驗,作為在該領域進行基準測試的一個步驟。我們基於多模感知的神經表示可以作為推進機器人靈巧度的感知基礎。有關視頻可在我們的項目網站 https://suddhu.github.io/neural-feels/ 上找到。
English
To achieve human-level dexterity, robots must infer spatial awareness from
multimodal sensing to reason over contact interactions. During in-hand
manipulation of novel objects, such spatial awareness involves estimating the
object's pose and shape. The status quo for in-hand perception primarily
employs vision, and restricts to tracking a priori known objects. Moreover,
visual occlusion of objects in-hand is imminent during manipulation, preventing
current systems to push beyond tasks without occlusion. We combine vision and
touch sensing on a multi-fingered hand to estimate an object's pose and shape
during in-hand manipulation. Our method, NeuralFeels, encodes object geometry
by learning a neural field online and jointly tracks it by optimizing a pose
graph problem. We study multimodal in-hand perception in simulation and the
real-world, interacting with different objects via a proprioception-driven
policy. Our experiments show final reconstruction F-scores of 81% and average
pose drifts of 4.7,mm, further reduced to 2.3,mm with known
CAD models. Additionally, we observe that under heavy visual occlusion we can
achieve up to 94% improvements in tracking compared to vision-only methods.
Our results demonstrate that touch, at the very least, refines and, at the very
best, disambiguates visual estimates during in-hand manipulation. We release
our evaluation dataset of 70 experiments, FeelSight, as a step towards
benchmarking in this domain. Our neural representation driven by multimodal
sensing can serve as a perception backbone towards advancing robot dexterity.
Videos can be found on our project website
https://suddhu.github.io/neural-feels/