重新思考FID：朝向更佳的圖像生成評估指標

摘要

和許多機器學習問題一樣，影像生成方法的進展取決於良好的評估指標。其中最流行的之一是Frechet Inception Distance（FID）。FID用於估計真實影像的Inception-v3特徵分佈與演算法生成影像的特徵之間的距離。我們強調FID存在重要缺陷：Inception對現代文本到影像模型生成的豐富多樣內容表現不佳、錯誤的正態性假設以及樣本複雜度不足。我們呼籲重新評估將FID作為生成影像的主要品質指標的適用性。我們實證表明，FID與人工評分者相矛盾，無法反映逐步改進的迭代文本到影像模型、無法捕捉失真程度，並在改變樣本大小時產生不一致結果。我們還提出了一個新的替代指標CMMD，基於更豐富的CLIP嵌入和與高斯RBF核的最大均值差距距離。它是一個無偏估計量，不對嵌入的概率分佈做任何假設，並且具有樣本效率。通過廣泛的實驗和分析，我們證明基於FID對文本到影像模型進行評估可能不可靠，而CMMD提供了更穩健和可靠的影像品質評估。

English

As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm. We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity. We call for a reevaluation of FID's use as the primary quality metric for generated images. We empirically demonstrate that FID contradicts human raters, it does not reflect gradual improvement of iterative text-to-image models, it does not capture distortion levels, and that it produces inconsistent results when varying the sample size. We also propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel. It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient. Through extensive experiments and analysis, we demonstrate that FID-based evaluations of text-to-image models may be unreliable, and that CMMD offers a more robust and reliable assessment of image quality.

重新思考FID：朝向更佳的圖像生成評估指標

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

摘要

Support