提升重建FID对扩散生成FID的预测能力
Making Reconstruction FID Predictive of Diffusion Generation FID
March 5, 2026
作者: Tongda Xu, Mingwei He, Shady Abu-Hussein, Jose Miguel Hernandez-Lobato, Haotian Zhang, Kai Zhao, Chao Zhou, Ya-Qin Zhang, Yan Wang
cs.AI
摘要
众所周知,变分自编码器的重建FID(rFID)与潜在扩散模型的生成FID(gFID)相关性较弱。我们提出插值FID(iFID)——rFID的一种简单变体,其与gFID表现出强相关性。具体而言,对于数据集中的每个样本,我们在潜在空间中检索其最近邻(NN),并对两者的潜在表示进行插值。随后对插值后的潜在表示进行解码,并计算解码样本与原始数据集之间的FID值。此外,我们通过证明rFID与扩散精炼阶段的样本质量相关,而iFID与扩散导航阶段的样本质量相关,进一步细化了关于rFID与gFID相关性弱的论断。通过联系扩散泛化与幻觉的相关研究成果,我们还解释了iFID与gFID强相关的原因,以及重建类指标为何与gFID呈负相关。实验表明,iFID是首个与扩散gFID呈现强相关性的指标,其皮尔逊线性相关与斯皮尔曼秩相关系数均达到约0.85。源代码已发布于https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID。
English
It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a strong correlation with gFID. Specifically, for each element in the dataset, we retrieve its nearest neighbor (NN) in the latent space and interpolate their latent representations. We then decode the interpolated latent and compute the FID between the decoded samples and the original dataset. Additionally, we refine the claim that rFID correlates poorly with gFID, by showing that rFID correlates with sample quality in the diffusion refinement phase, whereas iFID correlates with sample quality in the diffusion navigation phase. Furthermore, we provide an explanation for why iFID correlates well with gFID, and why reconstruction metrics are negatively correlated with gFID, by connecting to results in the diffusion generalization and hallucination. Empirically, iFID is the first metric to demonstrate a strong correlation with diffusion gFID, achieving Pearson linear and Spearman rank correlations approximately 0.85. The source code is provided in https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID.