ChatPaper.aiChatPaper

DVD:基於生成先驗的確定性影片深度估計

DVD: Deterministic Video Depth Estimation with Generative Priors

March 12, 2026
作者: Hongfei Zhang, Harold Haodong Chen, Chenfei Liao, Jing He, Zixin Zhang, Haodong Li, Yihao Liang, Kanghao Chen, Bin Ren, Xu Zheng, Shuai Yang, Kun Zhou, Yinchuan Li, Nicu Sebe, Ying-Cong Chen
cs.AI

摘要

現有影片深度估計面臨一個根本性取捨難題:生成式模型易受隨機幾何幻覺和尺度漂移影響,而判別式模型需依賴大規模標註數據集才能解決語義歧義。為突破此困境,我們提出DVD框架——首個將預訓練影片擴散模型確定性改造成單次推斷深度回歸器的方案。具體而言,DVD具備三項核心設計:(i)重新利用擴散時間步作為結構錨點,平衡全局穩定性與高頻細節;(ii)潛在流形校正技術,通過施加微分約束抑制回歸導致的過度平滑,恢復銳利邊界與連貫運動;(iii)全局仿射一致性這一內在特性,可約束視窗間發散度,實現無需複雜時間對齊的長影片無縫推斷。大量實驗表明,DVD在跨基準測試中實現了最先進的零樣本性能。更值得注意的是,DVD僅需使用領先基準方法1/163的任務專用數據,便能成功釋放影片基礎模型中隱含的深層幾何先驗。我們已完整開源整個訓練套件,為開源社群提供現今最優的影片深度估計全流程解決方案。
English
Existing video depth estimation faces a fundamental trade-off: generative models suffer from stochastic geometric hallucinations and scale drift, while discriminative models demand massive labeled datasets to resolve semantic ambiguities. To break this impasse, we present DVD, the first framework to deterministically adapt pre-trained video diffusion models into single-pass depth regressors. Specifically, DVD features three core designs: (i) repurposing the diffusion timestep as a structural anchor to balance global stability with high-frequency details; (ii) latent manifold rectification (LMR) to mitigate regression-induced over-smoothing, enforcing differential constraints to restore sharp boundaries and coherent motion; and (iii) global affine coherence, an inherent property bounding inter-window divergence, which enables seamless long-video inference without requiring complex temporal alignment. Extensive experiments demonstrate that DVD achieves state-of-the-art zero-shot performance across benchmarks. Furthermore, DVD successfully unlocks the profound geometric priors implicit in video foundation models using 163x less task-specific data than leading baselines. Notably, we fully release our pipeline, providing the whole training suite for SOTA video depth estimation to benefit the open-source community.
PDF182March 15, 2026