画像をIMUとして：単一のモーションブラー画像からカメラ運動を推定する

要旨

多くのロボティクスやVR/ARアプリケーションにおいて、高速なカメラ運動は高度なモーションブラーを引き起こし、既存のカメラ姿勢推定手法を失敗させます。本研究では、モーションブラーを不要なアーティファクトとして扱うのではなく、運動推定のための豊かな手がかりとして活用する新しいフレームワークを提案します。私たちのアプローチは、単一のモーションブラー画像から直接、密なモーションフローフィールドと単眼深度マップを予測することによって機能します。その後、微小運動の仮定の下で線形最小二乗問題を解くことで、瞬間的なカメラ速度を復元します。本質的に、この手法はIMUのような測定値を生成し、高速で激しいカメラの動きを頑健に捉えます。モデルを訓練するために、ScanNet++v2から導出された現実的な合成モーションブラーを含む大規模なデータセットを構築し、完全に微分可能なパイプラインを使用して実データ上でエンドツーエンドで訓練することでモデルをさらに洗練させます。実世界のベンチマークでの広範な評価により、本手法がMASt3RやCOLMAPなどの現在の手法を上回り、最先端の角速度および並進速度推定を達成することが実証されています。

English

In many robotics and VR/AR applications, fast camera motions cause a high level of motion blur, causing existing camera pose estimation methods to fail. In this work, we propose a novel framework that leverages motion blur as a rich cue for motion estimation rather than treating it as an unwanted artifact. Our approach works by predicting a dense motion flow field and a monocular depth map directly from a single motion-blurred image. We then recover the instantaneous camera velocity by solving a linear least squares problem under the small motion assumption. In essence, our method produces an IMU-like measurement that robustly captures fast and aggressive camera movements. To train our model, we construct a large-scale dataset with realistic synthetic motion blur derived from ScanNet++v2 and further refine our model by training end-to-end on real data using our fully differentiable pipeline. Extensive evaluations on real-world benchmarks demonstrate that our method achieves state-of-the-art angular and translational velocity estimates, outperforming current methods like MASt3R and COLMAP.

画像をIMUとして：単一のモーションブラー画像からカメラ運動を推定する

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

要旨

Support