视觉生成的表示弗雷谢特损失

摘要

我们证明了长期被认为不切实际的训练目标——弗雷歇距离（FD），实际上可以在表征空间中被有效优化。我们的核心思路很简单：将FD估算所需的大样本量（如5万）与梯度计算所用的小批量规模（如1024）进行解耦。我们将这种方法命名为FD损失函数。优化FD损失函数揭示了若干惊人发现：首先，在不同表征空间中对基础生成器进行FD损失的训练后优化，能持续提升视觉质量。在Inception特征空间下，单步生成器在ImageNet 256×256数据集上实现了0.72的FID值。其次，同一FD损失函数可将多步生成器直接转化为强效单步生成器，且无需教师蒸馏、对抗训练或逐样本目标。第三，FID可能错误评估视觉质量：现代表征方法即便在Inception FID指标较差时，仍能生成更优质的样本。这促使我们提出FDr^k多表征评估指标。本研究期望推动生成模型领域进一步探索不同表征空间中分布距离的双重作用——既作为训练目标，也作为评估指标。

English

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.