扩散模型在光流和单目深度估计中的惊人有效性
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
June 2, 2023
作者: Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
cs.AI
摘要
去噪扩散概率模型以其出色的保真度和多样性改变了图像生成领域。我们展示它们在估计光流和单目深度方面也表现出色,令人惊讶的是,无需针对这些任务主导的特定架构和损失函数。与传统基于回归的方法的点估计相比,扩散模型还能实现蒙特卡洛推断,例如捕捉光流和深度中的不确定性和模糊性。通过自监督预训练、结合合成和真实数据进行监督训练、技术创新(填充和步进展开去噪扩散训练)来处理嘈杂不完整的训练数据,以及一种简单的粗到精细的改进形式,可以训练用于深度和光流估计的最先进的扩散模型。广泛实验侧重于针对基准、消融实验以及模型捕捉不确定性和多模态性、填补缺失值的定量性能。我们的模型,DDVM(去噪扩散视觉模型),在室内NYU基准测试中获得了0.074的最先进相对深度误差,而在KITTI光流基准测试中的Fl-all异常值率为3.26\%,比最佳已发表方法提高了约25\%。有关概述,请参阅 https://diffusion-vision.github.io。
English
Denoising diffusion probabilistic models have transformed image generation
with their impressive fidelity and diversity. We show that they also excel in
estimating optical flow and monocular depth, surprisingly, without
task-specific architectures and loss functions that are predominant for these
tasks. Compared to the point estimates of conventional regression-based
methods, diffusion models also enable Monte Carlo inference, e.g., capturing
uncertainty and ambiguity in flow and depth. With self-supervised pre-training,
the combined use of synthetic and real data for supervised training, and
technical innovations (infilling and step-unrolled denoising diffusion
training) to handle noisy-incomplete training data, and a simple form of
coarse-to-fine refinement, one can train state-of-the-art diffusion models for
depth and optical flow estimation. Extensive experiments focus on quantitative
performance against benchmarks, ablations, and the model's ability to capture
uncertainty and multimodality, and impute missing values. Our model, DDVM
(Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth
error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\%
on the KITTI optical flow benchmark, about 25\% better than the best published
method. For an overview see https://diffusion-vision.github.io.