시각적 생성 품질 평가를 위한 표현 프레셰 손실

초록

우리는 오랫동안 훈련 목적으로 실용적이지 않다고 여겨졌던 프레셰 거리(FD)가 사실 표현 공간에서 효과적으로 최적화될 수 있음을 보여준다. 우리의 아이디어는 간단하다: FD 추정을 위한 모집단 크기(예: 5만)와 기울기 계산을 위한 배치 크기(예: 1024)를 분리하는 것이다. 우리는 이 방식을 FD-손실이라 명명한다. FD-손실 최적화는 몇 가지 놀라운 발견을 드러낸다. 첫째, 기본 생성기를 다양한 표현 공간에서 FD-손실로 사후 훈련하면 시각적 품질이 지속적으로 향상된다. Inception 특징 공간에서 1단계 생성기는 ImageNet 256x256에서 0.72 FID를 달성한다. 둘째, 동일한 FD-손실은 교사 증류, 적대적 훈련 또는 개별 샘플 목표 없이도 다단계 생성기를 강력한 1단계 생성기로 전환할 수 있다. 셋째, FID가 시각적 품질 순위를 잘못 매길 수 있다: 현대적 표현은 Inception FID 점수는 더 낮음에도 더 나은 샘플을 생성할 수 있다. 이는 다중 표현 평가 지표인 FDr^k의 동기가 된다. 우리는 이 연구가 생성 모델의 훈련 목적 함수와 평가 지표로서 다양한 표현 공간에서의 분포 거리 탐구를 촉진하기를 바란다.

English

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

시각적 생성 품질 평가를 위한 표현 프레셰 손실

Representation Fréchet Loss for Visual Generation

초록

Support