BRIDGE - 构建面向单目深度估计的强化学习深度图像数据生成引擎
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
September 29, 2025
作者: Dingning Liu, Haoyu Guo, Jingyi Zhou, Tong He
cs.AI
摘要
单目深度估计(MDE)是计算机视觉领域的一项基础任务。传统方法受限于数据稀缺性和质量问题,影响了其鲁棒性。为解决这一难题,我们提出了BRIDGE,一个基于强化学习优化的深度到图像(D2I)生成框架,该框架能够从多样化的源深度图中合成超过2000万张真实且几何精确的RGB图像,每张图像均与其真实深度值天然配对。随后,我们在此数据集上训练深度估计模型,采用了一种混合监督策略,将教师模型的伪标签与真实深度相结合,以实现全面而稳健的训练。这一创新的数据生成与训练范式使BRIDGE在规模和领域多样性上取得突破,在定量评估及复杂场景细节捕捉方面持续超越现有最先进方法,从而促进了通用且鲁棒的深度特征学习。代码与模型已发布于https://dingning-liu.github.io/bridge.github.io/。
English
Monocular Depth Estimation (MDE) is a foundational task for computer vision.
Traditional methods are limited by data scarcity and quality, hindering their
robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image
(D2I) generation framework that synthesizes over 20M realistic and
geometrically accurate RGB images, each intrinsically paired with its ground
truth depth, from diverse source depth maps. Then we train our depth estimation
model on this dataset, employing a hybrid supervision strategy that integrates
teacher pseudo-labels with ground truth depth for comprehensive and robust
training. This innovative data generation and training paradigm enables BRIDGE
to achieve breakthroughs in scale and domain diversity, consistently
outperforming existing state-of-the-art approaches quantitatively and in
complex scene detail capture, thereby fostering general and robust depth
features. Code and models are available at
https://dingning-liu.github.io/bridge.github.io/.