시야각 조건부 확산 모델을 활용한 제로샷 메트릭 깊이 추정

초록

단안 깊이 추정 방법론은 표준 벤치마크에서 상당한 진전을 이루었지만, 제로샷 메트릭 깊이 추정은 여전히 해결되지 않은 문제로 남아 있습니다. 주요 과제로는 실내와 실외 장면의 공동 모델링이 있는데, 이는 종종 RGB와 깊이의 분포가 크게 다르며, 알려지지 않은 카메라 내부 파라미터로 인한 깊이 스케일 모호성이 포함됩니다. 최근 연구에서는 실내와 실외 장면을 공동으로 모델링하기 위한 특수화된 멀티헤드 아키텍처를 제안했습니다. 이와 대조적으로, 우리는 일반적이고 작업에 구애받지 않는 디퓨전 모델을 주장하며, 실내와 실외 장면의 공동 모델링을 가능하게 하는 로그 스케일 깊이 파라미터화, 스케일 모호성을 처리하기 위한 시야각(FOV) 조건화, 그리고 훈련 데이터셋의 제한된 카메라 내부 파라미터를 넘어서기 위해 훈련 중에 FOV를 합성적으로 증강하는 등의 여러 발전을 이루었습니다. 더욱이, 일반적인 것보다 더 다양한 훈련 혼합물과 효율적인 디퓨전 파라미터화를 사용함으로써, 우리의 방법인 DMD(Diffusion for Metric Depth)는 적은 수의 노이즈 제거 단계만을 사용하여 제로샷 실내 데이터셋에서 상대 오차(REL)를 25% 감소시키고, 제로샷 실외 데이터셋에서는 현재 SOTA 대비 33% 감소를 달성했습니다. 개요는 https://diffusion-vision.github.io/dmd에서 확인할 수 있습니다.

English

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

시야각 조건부 확산 모델을 활용한 제로샷 메트릭 깊이 추정

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

초록

Support