視覚的生成のための表現フレシェ損失

要旨

我々は、従来学習目的として非現実的と見なされてきたフレシェ距離（FD）が、実は表現空間において効果的に最適化できることを示す。我々のアイデアは単純である。FD推定のための母集団サイズ（例：5万）と、勾配計算のためのバッチサイズ（例：1024）を分離する。この手法をFD-lossと名付ける。FD-lossの最適化により、いくつかの驚くべき知見が得られた。第一に、ベース生成器を異なる表現空間でFD-lossを用いて学習後改善すると、一貫して視覚的品質が向上する。Inception特徴空間では、ワンステップ生成器がImageNet 256x256でFID 0.72を達成した。第二に、同じFD-lossにより、教師知識蒸留、敵対的訓練、または個別サンプルターゲットを必要とせずに、マルチステップ生成器を強力なワンステップ生成器に転用できる。第三に、FIDは視覚的品質の順位を誤ることがある。現代的な表現は、Inception FIDが悪くても、より優れたサンプルを生成し得る。これを受けて、複数表現を用いた評価指標FDr^kを提案する。本研究が、生成モデルの学習目的および評価指標として、多様な表現空間における分布距離のさらなる探求を促すことを期待する。

English

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

視覚的生成のための表現フレシェ損失

Representation Fréchet Loss for Visual Generation

要旨

Support