BRIDGE - 构建面向单目深度估计的强化学习深度图像数据生成引擎

摘要

单目深度估计（MDE）是计算机视觉领域的一项基础任务。传统方法受限于数据稀缺性和质量问题，影响了其鲁棒性。为解决这一难题，我们提出了BRIDGE，一个基于强化学习优化的深度到图像（D2I）生成框架，该框架能够从多样化的源深度图中合成超过2000万张真实且几何精确的RGB图像，每张图像均与其真实深度值天然配对。随后，我们在此数据集上训练深度估计模型，采用了一种混合监督策略，将教师模型的伪标签与真实深度相结合，以实现全面而稳健的训练。这一创新的数据生成与训练范式使BRIDGE在规模和领域多样性上取得突破，在定量评估及复杂场景细节捕捉方面持续超越现有最先进方法，从而促进了通用且鲁棒的深度特征学习。代码与模型已发布于https://dingning-liu.github.io/bridge.github.io/。

English

Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at https://dingning-liu.github.io/bridge.github.io/.

BRIDGE - 构建面向单目深度估计的强化学习深度图像数据生成引擎

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

摘要

Support