ニューラルフィールドによる触覚：把持操作のための視覚-触覚知覚

要旨

人間レベルの器用さを実現するためには、ロボットがマルチモーダルセンシングから空間認識を推論し、接触相互作用を推論する必要があります。新しい物体の把持操作中、このような空間認識には物体の姿勢と形状の推定が含まれます。把持知覚の現状では、主に視覚が使用され、事前に既知の物体の追跡に限定されています。さらに、操作中に把持中の物体が視覚的に遮蔽されることは避けられず、現在のシステムは遮蔽のないタスクを超えることができません。我々は、マルチフィンガーハンド上で視覚と触覚センシングを組み合わせ、把持操作中の物体の姿勢と形状を推定します。我々の手法であるNeuralFeelsは、オンラインでニューラルフィールドを学習して物体の形状を符号化し、姿勢グラフ問題を最適化することでそれを共同で追跡します。我々は、シミュレーションと実世界でマルチモーダル把持知覚を研究し、固有受容感覚駆動のポリシーを通じて異なる物体と相互作用します。我々の実験では、最終的な再構成Fスコアが81%、平均姿勢ドリフトが4.7mmであり、既知のCADモデルを使用すると2.3mmにさらに減少します。さらに、重度の視覚的遮蔽下では、視覚のみの方法と比較して最大94%の追跡改善を達成できることを観察しました。我々の結果は、触覚が少なくとも把持操作中の視覚推定を洗練し、最良の場合には曖昧さを解消することを示しています。我々は、この領域のベンチマークに向けた一歩として、70の実験からなる評価データセットFeelSightを公開します。マルチモーダルセンシングによって駆動される我々のニューラル表現は、ロボットの器用さを進歩させるための知覚の基盤として役立つことができます。ビデオはプロジェクトウェブサイトhttps://suddhu.github.io/neural-feels/でご覧いただけます。

English

To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of 81% and average pose drifts of 4.7,mm, further reduced to 2.3,mm with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to 94% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/

ニューラルフィールドによる触覚：把持操作のための視覚-触覚知覚

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

要旨

Support