BRIDGE - 構建強化學習深度圖像數據生成引擎用於單目深度估計

摘要

單目深度估計（Monocular Depth Estimation, MDE）是計算機視覺中的一項基礎任務。傳統方法受限於數據稀缺與質量問題，影響了其魯棒性。為此，我們提出了BRIDGE，這是一個基於強化學習優化的深度到圖像（Depth-to-Image, D2I）生成框架，它能夠從多樣化的源深度圖中合成超過2000萬張既真實又幾何精確的RGB圖像，每張圖像都內在地配對了其真實深度值。隨後，我們在此數據集上訓練我們的深度估計模型，採用了一種混合監督策略，該策略結合了教師模型的偽標籤與真實深度信息，以實現全面且魯棒的訓練。這一創新的數據生成與訓練範式使BRIDGE在規模與領域多樣性上取得突破，無論是在定量評估還是在複雜場景細節捕捉方面，均持續超越現有的頂尖方法，從而促進了通用且魯棒的深度特徵的發展。代碼與模型可通過https://dingning-liu.github.io/bridge.github.io/獲取。

English

Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at https://dingning-liu.github.io/bridge.github.io/.

BRIDGE - 構建強化學習深度圖像數據生成引擎用於單目深度估計

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

摘要

Support