신경 필드와 함께하는 신경적 감각: 손 안에서의 조작을 위한 시각-촉각 인지

초록

인간 수준의 민첩성을 달성하기 위해서는 로봇이 다중 감각 정보를 통해 공간 인식을 추론하고 접촉 상호작용을 이해할 수 있어야 합니다. 새로운 물체를 손 안에서 조작하는 동안, 이러한 공간 인식은 물체의 자세와 형태를 추정하는 것을 포함합니다. 현재 손 안에서의 인식 기술은 주로 시각에 의존하며, 사전에 알려진 물체의 추적에 제한되어 있습니다. 더욱이, 조작 중에는 물체가 시각적으로 가려지는 경우가 빈번히 발생하여, 현재 시스템은 가려지지 않은 작업을 넘어서는 데 한계가 있습니다. 우리는 다중 손가락 로봇 손에 시각과 촉각 감지를 결합하여 손 안에서 물체의 자세와 형태를 추정합니다. 우리의 방법인 NeuralFeels는 신경 필드를 온라인으로 학습하여 물체의 기하학적 구조를 인코딩하고, 자세 그래프 문제를 최적화하여 이를 공동으로 추적합니다. 우리는 시뮬레이션과 실제 환경에서 다중 감각 손 안 인식을 연구하며, 자세 감지 기반 정책을 통해 다양한 물체와 상호작용합니다. 실험 결과, 최종 재구성 F-점수는 81%이며, 평균 자세 오차는 4.7mm로, CAD 모델이 알려진 경우 2.3mm로 감소합니다. 또한, 심각한 시각적 가림 상황에서 시각만 사용한 방법 대비 최대 94%의 추적 성능 향상을 관찰했습니다. 우리의 결과는 촉각이 최소한 시각적 추정을 개선하고, 최대한 시각적 추정의 모호성을 해소할 수 있음을 보여줍니다. 우리는 이 분야의 벤치마킹을 위한 한 걸음으로 70개의 실험 데이터셋인 FeelSight를 공개합니다. 다중 감각 정보에 기반한 우리의 신경 표현은 로봇의 민첩성을 향상시키기 위한 인식의 기반으로 활용될 수 있습니다. 비디오는 프로젝트 웹사이트 https://suddhu.github.io/neural-feels/에서 확인할 수 있습니다.

English

To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of 81% and average pose drifts of 4.7,mm, further reduced to 2.3,mm with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to 94% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/

신경 필드와 함께하는 신경적 감각: 손 안에서의 조작을 위한 시각-촉각 인지

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

초록

Support