知覚課題の拡散モデルのスケーリング特性

要旨

本論文では、拡散モデルを用いた反復計算が、生成だけでなく視覚知覚タスクにも強力なパラダイムを提供すると主張する。私たちは、深度推定、光学フロー、セグメンテーションなどのタスクを画像間変換の下で統一し、拡散モデルがこれらの知覚タスクのためにトレーニングとテスト時の計算をスケーリングする方法を示す。これらのスケーリング動作を注意深く分析することで、視覚知覚タスクのために拡散モデルを効率的にトレーニングするためのさまざまな技術を提案する。私たちのモデルは、著しく少ないデータと計算を使用して、最先端の手法と比較して改善されたまたは同等の性能を達成する。コードとモデルを使用するには、https://scaling-diffusion-perception.github.io を参照してください。

English

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks. Through a careful analysis of these scaling behaviors, we present various techniques to efficiently train diffusion models for visual perception tasks. Our models achieve improved or comparable performance to state-of-the-art methods using significantly less data and compute. To use our code and models, see https://scaling-diffusion-perception.github.io .

知覚課題の拡散モデルのスケーリング特性

Scaling Properties of Diffusion Models for Perceptual Tasks

要旨

Support