BetterDepth：ゼロショット単眼深度推定のためのプラグアンドプレイ型拡散リファイナー

要旨

大規模データセットを用いて学習することで、ゼロショット単眼深度推定（MDE）手法は実世界での堅牢な性能を示すが、しばしば十分に精密なディテールを欠くという課題がある。最近の拡散モデルベースのMDEアプローチは魅力的なディテール抽出能力を示すものの、多様なデータセットから堅牢な幾何学的な事前情報を得る難しさから、幾何学的に複雑なシーンでは依然として苦戦している。両者の補完的な利点を活用するため、我々はBetterDepthを提案し、幾何学的に正確なアフィン不変MDE性能を効率的に達成しつつ、細かなディテールを捉えることを目指す。具体的には、BetterDepthは事前学習済みMDEモデルの予測を深度条件として取り込み、その中でグローバルな深度コンテキストが十分に捉えられた上で、入力画像に基づいてディテールを反復的に洗練する条件付き拡散ベースのリファイナである。このようなリファイナの学習のために、我々はグローバルな事前アライメントとローカルパッチマスキング手法を提案し、BetterDepthが深度条件に忠実でありつつ、細かなシーンディテールを捉えることを保証する。小規模な合成データセットでの効率的な学習により、BetterDepthは多様な公開データセットおよび実世界のシーンにおいて、最先端のゼロショットMDE性能を達成する。さらに、BetterDepthは追加の再学習なしに、他のMDEモデルの性能をプラグアンドプレイ方式で向上させることができる。

English

By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods show robust performance in the wild but often suffer from insufficiently precise details. Although recent diffusion-based MDE approaches exhibit appealing detail extraction ability, they still struggle in geometrically challenging scenes due to the difficulty of gaining robust geometric priors from diverse datasets. To leverage the complementary merits of both worlds, we propose BetterDepth to efficiently achieve geometrically correct affine-invariant MDE performance while capturing fine-grained details. Specifically, BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning, in which the global depth context is well-captured, and iteratively refines details based on the input image. For the training of such a refiner, we propose global pre-alignment and local patch masking methods to ensure the faithfulness of BetterDepth to depth conditioning while learning to capture fine-grained scene details. By efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and in-the-wild scenes. Moreover, BetterDepth can improve the performance of other MDE models in a plug-and-play manner without additional re-training.

BetterDepth：ゼロショット単眼深度推定のためのプラグアンドプレイ型拡散リファイナー

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

要旨

Support