ChatPaper.aiChatPaper

DepthFM:使用流匹配快速進行單眼深度估計

DepthFM: Fast Monocular Depth Estimation with Flow Matching

March 20, 2024
作者: Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer
cs.AI

摘要

單目深度估計對於眾多下游視覺任務和應用至關重要。目前針對此問題的判別式方法存在模糊的瑕疵,而最先進的生成方法因其隨機微分方程(SDE)的特性而導致採樣速度緩慢。我們不是從噪聲開始,而是尋求從輸入圖像到深度圖之間的直接映射。我們觀察到,這可以有效地使用流匹配來構建,因為其在解空間中的直線軌跡提供了效率和高質量。我們的研究表明,預先訓練的圖像擴散模型可以作為流匹配深度模型的適當先驗,使得僅在合成數據上進行高效訓練就能推廣到真實圖像。我們發現輔助表面法線損失進一步改善了深度估計。由於我們方法的生成性質,我們的模型可可靠地預測其深度估計的置信度。在複雜自然場景的標準基準測試中,我們的輕量級方法表現出具有最先進性能的特點,儘管僅在少量合成數據上進行訓練,但計算成本低廉。
English
Monocular depth estimation is crucial for numerous downstream vision tasks and applications. Current discriminative approaches to this problem are limited due to blurry artifacts, while state-of-the-art generative methods suffer from slow sampling due to their SDE nature. Rather than starting from noise, we seek a direct mapping from input image to depth map. We observe that this can be effectively framed using flow matching, since its straight trajectories through solution space offer efficiency and high quality. Our study demonstrates that a pre-trained image diffusion model can serve as an adequate prior for a flow matching depth model, allowing efficient training on only synthetic data to generalize to real images. We find that an auxiliary surface normals loss further improves the depth estimates. Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates. On standard benchmarks of complex natural scenes, our lightweight approach exhibits state-of-the-art performance at favorable low computational cost despite only being trained on little synthetic data.

Summary

AI-Generated Summary

PDF171December 15, 2024