ChatPaper.aiChatPaper

感知任务的扩散模型的尺度特性

Scaling Properties of Diffusion Models for Perceptual Tasks

November 12, 2024
作者: Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
cs.AI

摘要

本文认为,采用扩散模型进行迭代计算不仅为生成任务,还为视觉感知任务提供了强大的范式。我们将深度估计、光流和分割等任务统一归类为图像到图像的转换,并展示了扩散模型如何从这些感知任务的训练和测试计算中受益。通过对这些扩展行为的仔细分析,我们提出了各种技术,以有效地训练扩散模型用于视觉感知任务。我们的模型在使用明显更少的数据和计算资源的情况下,实现了优化或可比较的性能,与最先进的方法相媲美。要使用我们的代码和模型,请访问 https://scaling-diffusion-perception.github.io 。
English
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks. Through a careful analysis of these scaling behaviors, we present various techniques to efficiently train diffusion models for visual perception tasks. Our models achieve improved or comparable performance to state-of-the-art methods using significantly less data and compute. To use our code and models, see https://scaling-diffusion-perception.github.io .

Summary

AI-Generated Summary

PDF132November 13, 2024