视觉生成中的表征弗雷谢特损失

摘要

我们证明了长期被认为不切合实际作为训练目标的弗雷歇距离（FD），实际上能在表征空间中被有效优化。我们的思路很简单：将用于FD估计的样本量（如5万）与用于梯度计算的批大小（如1024）解耦。我们将这种方法称为FD损失函数。优化FD损失函数揭示了若干惊人发现：首先，在不同表征空间中使用FD损失对基础生成器进行后训练，能持续提升视觉质量——在Inception特征空间下，单步生成器在ImageNet 256×256数据集上实现了0.72的FID值；其次，同一FD损失函数可将多步生成器直接转化为强效单步生成器，且无需教师蒸馏、对抗训练或逐样本目标；第三，FID可能误判视觉质量：现代表征方法即便在Inception FID指标较差时，仍能生成更优样本。这促使我们提出FDr^k这一多表征评估指标。本研究有望推动生成模型领域进一步探索不同表征空间中分布距离的双重作用——既作为训练目标，亦作为评估标准。

English

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

视觉生成中的表征弗雷谢特损失

Representation Fréchet Loss for Visual Generation

摘要

Support