均衡マッチング：暗黙のエネルギーベースモデルを用いた生成的モデリング

要旨

均衡マッチング（Equilibrium Matching: EqM）を紹介する。これは均衡ダイナミクスの視点から構築された生成モデリングフレームワークである。EqMは、従来の拡散モデルやフローベース生成モデルにおける非平衡・時間条件付きダイナミクスを捨て、代わりに暗黙のエネルギーランドスケープの均衡勾配を学習する。このアプローチにより、推論時には最適化ベースのサンプリングプロセスを採用できる。ここでは、学習されたランドスケープ上で調整可能なステップサイズ、適応型オプティマイザ、適応型計算を用いた勾配降下によってサンプルが得られる。EqMは、拡散/フローモデルの生成性能を経験的に上回り、ImageNet 256×256においてFID 1.90を達成した。EqMは理論的にもデータ多様体からの学習とサンプリングが正当化されている。生成だけでなく、EqMは部分的にノイズがかかった画像のノイズ除去、OOD検出、画像合成などのタスクを自然に扱える柔軟なフレームワークでもある。時間条件付き速度を統一された均衡ランドスケープに置き換えることで、EqMはフローモデルとエネルギーベースモデル間のより密接な架け橋を提供し、最適化駆動型推論へのシンプルな道筋を示す。

English

We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256times256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.

均衡マッチング：暗黙のエネルギーベースモデルを用いた生成的モデリング

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

要旨

Support