ミニマリスト視覚慣性オドメトリ

要旨

視覚慣性オドメトリ（VIO）は移動ロボットのナビゲーションに不可欠であり、多数の画素を持つカメラを利用する。しかし、カメラ画像の取得と処理には多大なリソースを要する。本研究では、平面オドメトリに対するミニマルなアプローチを提案し、わずか4つの視覚計測値とIMU（慣性計測ユニット）があれば、差動駆動ロボットに対して頑健な運動推定が可能であることを示す。我々の重要な知見は、光学ガボールマスクを通して外界を感知する4つの下向きフォトダイオードが、速度を符号化した信号を生成するという点にある。これに基づき、物理的に根拠のあるシミュレータを用いて、マスクパラメータと時間畳み込みネットワーク（TCN）を共同最適化する。その結果得られたモデルは、フォトダイオードによるわずか4つの計測値から速度を復号する。これらの速度推定値とIMUからの角速度を組み合わせることで、連続的な平面軌跡が得られる。我々は差動駆動ロボットに搭載した試作センサを用いて本手法を検証した。多様な屋内・屋外の地形において、本システムは実世界での微調整を行うことなく、基準となる真値に密に追従する。本研究は、ミニマルなセンシングにより効率的かつ高精度な平面オドメトリが実現できることを示している。

English

Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.