神经场与神经场感知:用于手持操作的视触知觉
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
December 20, 2023
作者: Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam
cs.AI
摘要
为了实现人类水平的灵巧度,机器人必须从多模态感知中推断空间意识,以便推理接触交互。在手中操纵新物体时,这种空间意识涉及估计物体的姿态和形状。目前手中感知的现状主要采用视觉,并限制于跟踪先验已知的物体。此外,在操纵过程中,手中物体的视觉遮挡是不可避免的,这会阻止当前系统在没有遮挡的任务上取得进展。我们结合视觉和触觉传感器在多指手上,以估计在手中操纵过程中物体的姿态和形状。我们的方法,神经触感(NeuralFeels),通过在线学习神经场来编码物体几何形状,并通过优化姿态图问题联合跟踪它。我们在仿真和现实世界中研究多模态手中感知,通过基于本体感知驱动的策略与不同物体进行交互。我们的实验结果显示最终重建的F分数为81%,平均姿态漂移为4.7毫米,进一步降低至2.3毫米,已知CAD模型。此外,我们观察到,在严重视觉遮挡下,与仅使用视觉的方法相比,我们可以实现高达94%的跟踪改进。我们的结果表明,触觉至少可以优化,甚至可以消除在手中操纵过程中的视觉估计的歧义。我们发布了包含70个实验的评估数据集FeelSight,作为在该领域进行基准测试的一步。我们基于多模态感知的神经表示可以作为推进机器人灵巧度的感知基础。视频可在我们的项目网站https://suddhu.github.io/neural-feels/找到。
English
To achieve human-level dexterity, robots must infer spatial awareness from
multimodal sensing to reason over contact interactions. During in-hand
manipulation of novel objects, such spatial awareness involves estimating the
object's pose and shape. The status quo for in-hand perception primarily
employs vision, and restricts to tracking a priori known objects. Moreover,
visual occlusion of objects in-hand is imminent during manipulation, preventing
current systems to push beyond tasks without occlusion. We combine vision and
touch sensing on a multi-fingered hand to estimate an object's pose and shape
during in-hand manipulation. Our method, NeuralFeels, encodes object geometry
by learning a neural field online and jointly tracks it by optimizing a pose
graph problem. We study multimodal in-hand perception in simulation and the
real-world, interacting with different objects via a proprioception-driven
policy. Our experiments show final reconstruction F-scores of 81% and average
pose drifts of 4.7,mm, further reduced to 2.3,mm with known
CAD models. Additionally, we observe that under heavy visual occlusion we can
achieve up to 94% improvements in tracking compared to vision-only methods.
Our results demonstrate that touch, at the very least, refines and, at the very
best, disambiguates visual estimates during in-hand manipulation. We release
our evaluation dataset of 70 experiments, FeelSight, as a step towards
benchmarking in this domain. Our neural representation driven by multimodal
sensing can serve as a perception backbone towards advancing robot dexterity.
Videos can be found on our project website
https://suddhu.github.io/neural-feels/